When there is an exam we always get to see three types of people: Ones who don’t study and just write the exam, second who memorise everything and third who actually try understanding every concept in the book.
The ones who don’t study will not be performing well in exams because they haven’t learned anything. This is a classic example of underfitting. The ones who memorise everything will also not perform well in exams, because they haven’t learned anything either. This is Overfitting. The ones who have learned or understood the concepts will do well in the exam. This is an example of a good fit, when you try to understand everything instead of memorising or not studying at all.
In this article we will discuss what is underfitting and overfitting, and how they have been avoided. So let’s start with Overfitting first.
Imagine you are in an outdoor wedding where you know nobody. As time passes you drift through the people exchanging introductions and small talks. You have decided to remember the name of each guest at the wedding. In order to register their names, you start to associate their appearance to their name.
For example, say one person you met was named Raj so you associated his name with his hairstyle which was similar to Shah Rukh Khan’s hair from DDLJ. Another person you met with the name of Dia so we associated her name with her dress as they were shining bright. You go on making a similar association of each person you meet and when you meet the same person again you remember their name.
Now during reception you encounter lots of new people. When you run into a person wearing a bright dress you say “Hey again, Dia?” only to get a confused expression saying “It’s Pooja, do I know you?”. This same mistake can happen again and again.
You might be thinking where your learning method went wrong? The thing is when you learned the name of people at the wedding you did it in a way that only worked for people in the original group and didn’t generalise. When you were associating someone’s name with their appearance you could have made some robust connection between name and appearance like the guy having good hair like Shah Rukh Khan, wearing goggles and black suit is Raj, or the girl wearing bright colour dress having long hair and hooked nose is Dia. This would have saved you from calling some person by the wrong name.
You did well in the pre-wedding party because you evaluated your performance on training data (the names of the people in the wedding). Here we have higher training accuracy as your success of guessing names was high and low training error as your failure was low. But at the reception when new people arrived (New Data) the generalisation error was high or generalisation accuracy was low.
We learned how to classify the data, but we used specific details in that data, rather than learning general rules that would apply to new data as well.
Machine learning systems are good at overfitting or we can say cheating. Overfitting happens when a machine tries to memorise from the training data. We have a simple data, but we try to fit it to a model that is too complex . It’s like killing a fly with a Tank.
Overfitting is a result of high variance and low base where bias measures the tendency of a system to consistently learn the wrong things, and variance measures its tendency to learn irrelevant details.
Detecting and Addressing Overfitting:
Suppose we are using a validation set in order to measure generalisation error of the system after each epoch. When generalisation error is measured using a validation set it is called Validation Error. So after a point when the validation error flattens out or starts to increase while training error is still decreasing then we can say that our model is overfitting.
There are two ways to stop our machine learning model from overfitting. 1) Early stopping and 2) Regularisation. Let’s discuss them one by one.
i) Early Stopping:
Initially when we start to train our model we are underfitting. As our model is trained more and more it refines its boundaries such that both validation and training error go down.
At some point though our training error decreases and the validation error starts to increase.
Training error goes down because our model is capturing more and more detail from the data. But now we are tuning our results too much to the training data, and the validation error is going up. So now our model is overfitting. Using this as a guiding principle we can stop training our model as soon as the model starts overfitting.
In Early Stopping the learning of the model is stopped as soon as validation error starts to increase. On the contrary in regularisation this is delayed by letting the model learn more and both training and validation error is pushed down. The techniques that delay the onset of overfitting are collectively known as regularisation methods, or simply regularisation.
Machine learning model doesn’t know that it is overfitting when we ask it to learn. It simply learns everything it can from the data. Regularisation techniques make sure that not one parameter, or no small set of parameters, dominate all the others. A popular way to perform regularisation, or delay the start of overfitting, is to limit the values of the parameters used by the classifier.
Lambda ( λ ) denotes regularisation parameter. Lambda can be used to choose how complex we want the boundaries to be. High lambda gives a smooth boundary and low lambda gives a boundary which fits more precisely to the data it’s looking at.
Underfitting can be thought of as oversimplifying the problem and coming up with too simple a solution. For example if we went to buy groceries in a grocery store without the list containing the items then we might end up buying unnecessary things and also missout the items which we actually need.
The approach won’t go well for us, because we underestimated the problem and came unprepared. This is Underfitting: Our dataset is complex and we came up with a simple model. This model will not be able to capture the complexity of the dataset. Underfitting is the result of high bias and low variance.
To achieve a good fit we have to find the sweet spot between underfitting and overfitting. Ideally to have a good fit on the data, the model should predict with zero errors.
Underfitting is not much of a problem. It can be solved by allowing the model to learn more or by increasing the complexity of the model. But when it comes to overfitting things might get out of hand. We have to keep track of what and how much our model is learning.
Do follow our LinkedIn page for updates: [ Myraah IO on LinkedIn ]