REGULARIZATION: RIDGE AND LASSO REGRESSION

3 min readJul 31, 2021

What is Overfitting?

Overfitting means building a model that matches the training data “too closely.” The model ends up training on noise rather than signal.

— Usually caused by a model that is too complex

— Overfit model does not generalize

— Low bias/high variance models

What are Bias and Variance?

Bias: The difference b/w the predicted values of the machine learning model and the true value. Being high in bias gives a large error in both training as well as testing data. It is recommended that an algorithm should always be low-biased to avoid underfitting of the model.

Variance: The variability of model prediction for a given data point which tells us the spread of our data. It leads to overfitting of our model (that is high train score than test score). That is why such models have high error rates on test data.

In short, we can say that bias is how bad your model is at predicting y, and Variance is how bad your model is at generalizing to new data.

What is Regularization?

It is a method for constraining/regularizing the size of the coefficients, thus reducing them towards zero. It also reduces variance and thus fixes overfitting. In case if our model is too complex, regularizing it reduces the variance more than it increases bias, which results in a model that is more likely to generalize.

There are three types of commonly used regularization techniques are,

● L1 Regularization

● L2 Regularization

● Dropout Regularization

LASSO Regression

When a regression model implements the L1 regularization technique, it is called LASSO Regression (Least Absolute Shrinkage and Selection Operator regression). Lasso adds a penalty term to the cost function. The added term is the absolute sum of coefficients. As the coefficient value increases from 0, this term penalizes, causing the model to reduce the value of coefficients in order to minimize loss. The difference b/w ridge regression and lasso regression is that lasso tends to make coefficients to zero as compared to Ridge that never sets the value of coefficient to absolute zero.

Ridge Regression

When a regression model implements the L2 regularization technique, it is called Ridge Regression. In this regression, we add a penalty term that is equal to the square of the coefficient. The L2 term is equal to the magnitude of the coefficients squared. We also add a coefficient λ (lambda) to control our penalty term. In this case, if λ = 0, then the equation is the basic OLS else if lambda(λ) > 0, then it will add a constraint to the coefficients. As we increase the value of lambda, this constraint causes the value of the coefficient to tend towards zero. This leads to both low variance and low bias.

Elastic Regression

The L1 regression can cause a slight bias in our model where y-variable (prediction) is heavily dependent on a particular x-variable (predictors). In this case, Elastic Net is better as it combines the regularization of both lasso and Ridge. The advantage of that is that it does not easily eliminate the high collinearity coefficient.