Introduction to Supervised Learning

January 17, 2022

What else can be the best example of supervised learning than us? Remember in school days we were taught one concept, then we solve examples related to that concept and then we are fed with some unseen example to check our understanding of the concept. We have been supervised with many such examples by our parents, teachers, friends etc.

We use a similar approach to teach various machine learning models to predict or classify certain things. As in the above example, in supervised machine learning, we give machines some input data including the correct output of the data. Then the machine uses a supervised learning algorithm to decipher patterns that exist in the data and develops a model that can reproduce matching results with new data. Basically with given input data and correct output data machines try to figure out the mapping function that correctly predicts the output for the given input.

Let’s see how supervised learning works with the below example. Consider you have pictures of cats and dogs as the input data. There is a folder labeled as cat containing pictures of cats and another folder labeled as dog containing pictures of dogs. Now these pictures along with their labels are fed to machine learning models which learn to classify cats from dogs. Then to check if our machine learning model has learned something we give test data containing pictures of cats or dogs to the model and it tries to predict whether the given picture is of cat or dog. If It has correctly predicted the mapping function then if it sees the picture of a dog it will return the label dog and if it sees the picture of a cat it will return the label cat.

Below are the following steps involved in supervised learning:

1)Collecting the dataset

2)Splitting dataset as input data (X) and output data (Y)

3)Splitting the above dataset further as training (X_train, Y_train) and testing (X_test, Y_test) datasets.

4)Using training dataset to train machine learning model.

5)Testing the accuracy of machine learning models using a testing data set

Further supervised learning is divided into two subcategories:

• Regression
• Classification

1)Regression

Regression is the ‘Hello world’ of machine learning algorithms. It is the simplest supervised machine learning technique used to find the best trendline to describe the data. Basically it finds a curve which best describes the relationship between input data and output data. It is used for the prediction of continuous variables.

Let’s try to understand this with an example: suppose we have to predict the yield of crops this year in a certain region. This helps farmers decide what to grow and when to grow. We are provided with statistics of many years. The data contains important information about weather conditions, fertilizers used, amount of water used, amount of money invested, crop grown, timing etc with the output as yield. Using this data we need to predict which crop will give maximum yield with minimal resources.

In order to achieve that we will try to find a relationship between the resources used and the yield. We will try to look for patterns in the data which will enable us to predict things more accurately. And then conclude with the crop that will give the maximum yield. This is a very common practice done by our ancestors to predict the yield of crops in fields. The only difference is that now we have more resources and data to predict things accurately.

It is the same thing that a machine learning model will do in order to predict the yield using Regression. When data is given to a machine learning model it will use regression to find the relation between input data and target variable (In this case yield). Once it gets the relationship it can accurately predict which crop will give us maximum yield for the given resources.

Regression is easy to understand and is very important as it provides the basis for more advanced machine learning techniques.

Regression has its application in a range of disciplines like finance, business, investing, trends in social media, GDP growth etc.

Some Algorithms used in regression techniques are-

• Linear regression
• Non-linear regression
• Regression trees
• Bayesian linear regression
• Polynomial regression

2)Classification

Remember the example we used to understand supervised machine learning where we classified cats from dogs? Classification is a big topic. We do classify things in our day to day life. We classify clothes in our cupboard, we arrange varieties of lentils in different jars, we arrange dishes in the kitchen rack, we have classified regions according to culture etc. Do read our previous article to learn more about classification. Here we will quickly try to understand how classification occurs in machines

We love to shop online where we also give feedback on the product we just purchased. If the product is good we write good things about it but if it is bad, we criticise and return it. Now let’s assume that there is a person sitting in amazon’s office separating good comments from bad ones. It is easy for him to recognize sarcasm, positive review, negative review, basically various human sentiments. He reads the review and puts it in a box containing a positive review. If it is a negative review he will put that review in a box containing a negative review. This is how he is going to classify reviews as positive or negative. But there are billions of such reviews coming each day. Even if he classifies one review in 10 sec it will take more than 3 years for him to classify 1 billion reviews.

But classification algorithms can do the job in merely a few hours. This is how they do it- first we feed the machine with some data containing the reviews and the sentiment as positive or negative. With the help of the data we train machines to learn human sentiments. After that we test the machine if it has learned anything by feeding some unseen data. If the machine successfully recognizes positive review from negative review in test data with some good accuracy, we deploy our model to finally do the job.

Some algorithms used to perform classification are given below:

• logistic regression
• k-nearest neighbors
• decision trees
• support vector machines
• naive bayes