Regularization techniques
What is regularization ?
Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well.
Why do we use regularization ?
If we don’t do the regularization our models may be too complex and overfit or too simple and underfit, either way giving poor predictions.
What are the regularization techniques ?
- L1 Regularization:
The L1 Regularisation technique updates the general cost function by adding another term known as the regularization term.
Due to the addition of this regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.
L1 Regularization is a model of choice when the number of features are high, Since it provides sparse solutions. We can get computational advantage as the features with zero coefficients can simply be ignored but L1 cannot be used in gradient-based approaches since it is not-differentiable.
- L2 Regulariaztion:
The L2 Regularization is similar to the L1 but we make a change to the regularization term.
L2 regularization is also known as weight decay as it forces the weights to decay towards zero (but not exactly zero).
It decreases the complexity of a model but does not reduce the number of variables since it never leads to a coefficient tending to zero rather only minimizes it. Hence, this model is not a good fit for feature reduction.
- Dropout:
Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.
It is usually preferred when we have a large neural network structure in order to introduce more randomness.
Also dropout prevents overfitting by ensuring that no units are codependent.
- Data Augmentation:
If you are over fitting getting more training data can help, but getting more training data can be expensive and sometimes you just can’t get more data. But what you can do is augment your training set by taking image like this. So data augmentation can be used as a regularization technique, this usually provides a big leap in improving the accuracy of the model. It can be considered as a mandatory trick in order to improve our predictions.
- Early Stopping:
One of the biggest problem in training neural network is how long to train the model, training too little will lead to underfit in train and test sets, traning too much will have the overfit in training set and poor result in test sets.
Early stopping is a kind of cross-validation strategy where we keep one part of the training set as the validation set. When we see that the performance on the validation set is getting worse, we immediately stop the training on the model. This is known as early stopping.
Thanks for your attention 😃 !