Linear models, including linear regression and models making improvements on it, can only offer a limited approximation of the real world. Therefore, moving beyond linearity is necessitated in various situations where more flexibility is required.
(Notice that ridge regression and Lasso actually reduce the variance or flexibility of linear regression.)
Basis Functions
To move beyond linearity, one basic idea is to transform the predictor using some non-linear functions and fit the model with the transformed values. The transformations are called basis functions and are applied to
Polynomial Regression
The idea is simple: to use not only
Problems
When the degree is higher than 3 or 4, the function would likely take very strange and wiggly shapes, especially near the boundary of the
Step Functions
We break the range of
Splines
We may generalize the idea of step functions and fit polynomial models in different intervals. This is called piecewise polynomials. However, this approach is often undesirable in that there is likely to be break points or discontinuous points in the fitted model which would look really ridiculous. To fix the problems of piecewise polynomials, one must make restrictions on the model and require continuity at knots of the fitted function (and possibly its
Natural Splines
Splines can still behave poorly in the boundary region. A natural spline additionally requires that the model should be linear at the boundary (for
Choosing the Number and Locations of the Knots
Locations of the knots:
- We can place more knots where the function might vary rapidly and fewer knots where it might be stable.
- Alternatively, it is common to place knots in a uniform fashion. We can specify the desired degrees of freedom and let the software place the corresponding number of knots at uniform quantiles of the data.
Number of the knots:
- We can try out different numbers of knots and see which produces the best looking curve.
- Alternatively, we can use cross-validation.
Advantages over Polynomial Regression
- We can increase the degree of freedom without increasing the degree of the function.
- We generally get more stable estimates.
- We can decide in which interval we want the function to be more flexible.
- We can often get desirable results at the boundaries with natural cubic spline.
Smoothing Splines
Recall that in every model we want a balance between its bias and variance. To reduce the bias, we usually minimize the RSS of the training data, and in the linear context, this is how we construct the linear regression model. To reduce the variance, we introduced ridge regression and Lasso as improvements on linear regression, which penalizes large values of linear coefficients. In general and beyond a linear context, to reduce the bias, we want to find a function
Choosing the Smoothing Parameter
We have mentioned that a smoothing spline is a shrunken version of a natural cubic spline. Shrinkage really means that the effective degrees of freedom of the model is reduced with the presence of the smoothing parameter
Local Regression
We may compute the fit at a target point
Preselect the fraction
where .Gather
nearest neighbors of (in terms of the distance between and ).Assign a weight
to each point in the neighborhood so that the point furthest from has weight 0 and the closest has the highest weight. Points outside the neighborhood has weight 0.Fit a weighted least squares regression of the
on the by finding and that minimize
- The fitted value at
is given by .
GAM
So far we have only discussed non-linear improvements on single-predictor linear regression (simple linear regression). When we have multiple predictors,