What is Regression?
Regression is a technique under supervised learning that estimates a function that gives a relationship between variables. If we have features X = (x1, x2,…,xN)T (Independent Variables) and target variable YT (Dependent Variables) our task will be to find the function f(X,Y) which gives a relationship between the features and target.
Here we try to understand how Y changes when X changes so that we can use this understanding further to predict Y for given X.
Let’s try to understand this with an example. You are conducting a case study on a set of land plots in pune to see how the price of plot changes with respect to plot size.
First we will collect the details of each plot as shown in the table below.
Now to understand the relationship between these two variables we draw the scatter plot.
Here we can see that as the size of the plot increases, the price also increases. Above scatter plot shows the linear relationship between plot size and plot price. So it means there are greater chances that larger plots usually will be priced higher.
Linear regression is a topic which is studied in great detail. So in this section we will be exploring each topic in great detail. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real variables such as sales, salary, stock market prediction, housing price etc.
In linear regression we use a linear approach to model the relationship between two variables by finding a function that is a close fit to the data. Mathematically we need to find the function y = f(x) that best describes the relationship between variables x and y.
The modelling assumptions are that x is an independent variable or predictor variable and y is a dependent variable or response variable. There is a linear relation between x and y.
When there is a single independent variable, the method is referred to as simple linear regression and when there are multiple independent variables then the method is called multiple linear regression.
The word ‘linear’ in linear regression does not refer to fitting a line but rather it refers to the linear algebraic equations for the unknown parameters.
Under linear regression we are going to study algorithms like Least Square, Gradient Descent and Regularisation.
Fitting a line using least squares:
Let’s try to understand this with an example.
Consider below hypothetical data.
What do you think? Which line better describes the relationship between two variables?
In an attempt to find the best fit line which accurately describes the relationship between two variables, let’s start with a horizontal line whose equation is y = c
Consider point (x1, y1), distance between c and y1 is given by (c-y1). Similarly distance between c and y2 is given by (c – y2) . So far the total distance is given by (c-y1) + (c – y2).
We can keep doing it and total distance after calculating distance between c and y3 will by (c-y1) + (c – y2) + (c – y3)
The distance between c and yn is (c – yn) which is negative. That’s not good as it will subtract from the total and make the overall fit appear better than it really is. Again if we look at yn+1 it will further reduce the total.
In order to tackle this, the mathematician came up with a solution where they squared each distance term and added.
After doing this our new equation looked like this-
(c-y1)2 + (c – y2)2 + (c – y3)2 + (c-y4)2 + (c – y5)2 + (c – y6)2 +(c-y7)2 + (c – y8)2 + (c – y9)2 + (c-y10)2 + …..
This is our measure of how well this line fits the data. It’s called “sum of squared residual”
Because the residuals are the distance between the real data and the line we are summing the square of these values.
If we rotate the line in anticlockwise manner the sum of squared residual will decrease until it reaches its optimal position and after that if we still rotate the line the sum of squared residual will start to increase.
So the ultimate aim will be to find the optimal position where this sum of squared residual is minimum.
To do this let’s start with generic line equation which is given by y = ao + a1x
We want to find the optimal values of ao and a1 so that we minimise the sum of squared error.
Mathematically this is given by S(ao, ai) = Σ ei2 = (yi −a1xi − ao)2.
Where (a1xi + ao) gives value of line at position xi and yi is observed value at xi
The above equation is calculating the distance between the line and the different values at xi.
Now the reason why this method is called least square is because we want to find the best value of a0 and a1 such that we get a line which will give us the smallest value of sum of square residual.
The plot of sum of squared residual versus each rotation is given below.
From the graph we can see that as we increase the rotation the sum of square residual decreases and after some rotation it starts to increase with each rotation.
To find the optimal rotation of the line we take the partial derivative of the function. The derivative gives the slope of the function at each point.
The slope at the best point is zero where we get the least square. It will be wise to note the different values of rotation are different values of slope ao and different values of intercept a1.
This can be better undertaken if we consider one more axis for intercept as shown in below diagram.
If we choose the intercept at point d then at that intercept we can plot a curve for a different slope and see how the value of sum of squared residual changes. We can do this for various values of intercept until we find the best value of slope and intercept for which sum of squared residual value is minimum.
After we have got the values of ao and a1 we can plot the line which gives us the best relationship between two variables.
The important things we should remember is.
1) We want to minimise the square of distance between observed values and the line.
2) We do this by taking the derivative and finding where it is equal to zero.
3) The final line minimises the sum of squares.
Do follow our LinkedIn page for updates: [ Myraah IO on LinkedIn ]