LASSO

Supporting Technique

LASSO is a regression technique that applies regularization to prevent overfitting and perform feature selection.

The Least Absolute Shrinkage and Selection Operator (LASSO) is commonly used in statistical modeling and machine learning to enhance the prediction accuracy and interpretability of regression models by imposing a constraint on the sum of the absolute values of the model parameters.

LASSO works by adding a penalty equal to the absolute value of the magnitude of coefficients to the loss function. This constraint causes some coefficients to be exactly zero, effectively performing feature selection by excluding irrelevant features.

For example, consider a dataset with numerous predictor variables for predicting house prices. LASSO can be used to identify the most significant predictors, such as location and size, while excluding less relevant features like the color of the house.

Although it is a useful technique in many situations, LASSO has the following disadvantages. If they are relevant to a use case, Elastic Net should be considered as an alternative:

  • Where predictor variables are highly correlated, LASSO tends to select one variable and reject the others.
  • Where there are a large number of predictor variables but only a small number of examples, LASSO is mathematically constrained to never select more predictor variables than the number of examples.

Least angle regression (LARS) can be regarded as a method with which to perform LASSO. The basic idea is similar to the now obsolete stepwise regression, but LARS produces more stable and usable results. Rather than adding and removing whole variables from the model, LARS adjusts the contributions of variables to the model by multiplying them with fractions. The procedure starts by finding the predictor variable that most closely correlates with the dependent variable. The contribution fraction for this predictor variable is gradually increased until the effect of the remaining fraction that has not yet been added is equal to the effect of the next best-correlated predictor variable. The procedure is then repeated for this second predictor variable, then for the third predictor variable, and so on.

Alias
Least absolute shrinkage and selection operator
Related terms
Regression Regularization Elastic Net Dimensionality Reduction