๐ช๐ต๐ ๐๐ผ๐ฒ๐ ๐๐ญ ๐ฅ๐ฒ๐ด๐๐น๐ฎ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐๐ฟ๐ถ๐๐ฒ ๐๐ผ๐ฒ๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐๐ ๐๐ผ ๐ญ๐ฒ๐ฟ๐ผ?
Regularization is a crucial technique in Machine Learning to address overfitting, with ๐๐ญ (๐๐ฎ๐๐๐ผ) and ๐๐ฎ (๐ฅ๐ถ๐ฑ๐ด๐ฒ) being the most widely used methods. Understanding their differences can significantly enhance model performance.
๐๐ญ ๐ฅ๐ฒ๐ด๐๐น๐ฎ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป adds a penalty term to the loss function:
Lossย Function = MSE + ๐ผ โ โฃ๐ค๐โฃ
๐๐ฎ ๐ฅ๐ฒ๐ด๐๐น๐ฎ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป uses a squared penalty term:
Lossย Function = MSE + ๐ผ โ(๐ค๐)2
The gradients of these penalties explain their behavior:
โข For L1, the gradient is constant (+1 or โ1), which drives weights to zero quickly.
โข For L2, the gradient is proportional to the weight (2w), so it decays as the weights approach zero, leading to small but non-zero values.
This distinction makes ๐๐ญ ๐ถ๐ฑ๐ฒ๐ฎ๐น ๐ณ๐ผ๐ฟ ๐ณ๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ฒ๐น๐ฒ๐ฐ๐๐ถ๐ผ๐ป, as it reduces the weights of irrelevant features to exact zero, while L2 generally shrinks weights closer to zero without fully eliminating them.
To better understand the impact of L1 and L2 regularization, I created an animation that visualizes their effects. The animation demonstrates how L1 effectively removes irrelevant features while L2 merely reduces their influence.
๐ ๐๐ผ๐ฑ๐ฒ ๐ฎ๐๐ฎ๐ถ๐น๐ฎ๐ฏ๐น๐ฒ ๐๐ญ ๐๐ ๐๐ฎ ๐ฅ๐ฒ๐ด๐๐น๐ฎ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐๐ป๐ถ๐บ๐ฎ๐๐ถ๐ผ๐ป : https://github.com/pritkudale/Code_for_LinkedIn/blob/main/L1_vs_L2_Regulerization.ipynb
๐น Watch my latest video on tackling overfitting with practical strategies: Ways to Improve Testing Accuracy: Overfitting & Underfitting by Pritam Kudale
#MachineLearning #DataScience #Regularization #L1L2 #Overfitting #AI #FeatureSelection