  #### Top Courses #### Machine Learning with Python & Statistics 4 (4,001 Ratings) 218 Learners

#### Live Masterclass on "Python for Artificial Intelligence" Dec 4th (7:00 PM) 208 Registered
More webinars

# Linear Regression - Starting Phase of Machine Learning. Ashish Katri

a year ago Linear Regression in Machine Learning

## A step by step explanation of the Linear Regression Algorithm.

Hello Folks,
Hope you are well and staying safe at your place. As we all know how this COVID-19 pandemic came and doesn't want to go from our life.
But as the whole world is fighting to get rid of this pandemic. I thought why can't I share some things which I know so that many people may get benefits from it.
Before directly going deep into the Linear regression algorithm.
Let us first understand
Let us first understand

## What is Regression?

Regression is a statistical technique that shows an algebraic relationship between two or more variables.
Based on this algebric relationship (rather than a function), one can estimate the value of a variable, given the values of the other variables.
Usually, correlation is used to check whether there is any relationship between the two variables. If any relationship found, regression is used to find the degree of relationships that can be then used for prediction.
Some of the examples are:
1)    Predict rainfall in cm for month.
2)    Predict stock price for next day.
Now as you got an idea about what is regression? Let’s move forward and see what are the types of regressions?

## Types of Regressions

•    Linear regression
•   Logistic regression
•   Polynomial regression
• Stepwise regression
•  Ridge regression
• Lasso regression
• ElasticNet regression
In this article I will explain you about Linear Regression and later I will try to take you through the other types of regressions.

## What is a Linear Regression?

Linear Regression is one of the most fundamental algorithms in the Machine Learning world which comes under supervised learning. Basically it performs a regression task. Regression models predict a dependent (target) value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on – the kind of relationship between the dependent and independent variables, they are considering and the number of independent variables being used. Graph For Linear Regression
Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y (output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary of a person. The regression line is the best fit line for our model.
Linear Regression may further divided into
1)    Simple Linear Regression/ Univariate Linear regression
2)    Multivariate Linear Regression

### Simple Linear Regression/ Univariate Linear Regression

When we try to find out a relationship between a dependent variable (Y) and one independent (X) then it is known as Simple Linear Regression/ Univariate Linear regression.
The mathematical equation can be given as:
Y = β0 + β1*x
Where
• Y is the response or the target variable
• x is the independent feature
• β1 is the coefficient of x
• β0 is the intercept
β0 and β1 are the model coefficients. To create a model, we must "learn" the values of these coefficients. And once we have the value of these coefficients, we can use the model to predict the Sales!
The main aim of the regression is to obtain a line that best fits the data. The best fit line is the one for which total prediction error (all data points) are as small as possible. Error is the distance between the points to the regression line.
Real-time example
Let’s suppose we have a dataset that contains information about the relationship between ‘a number of hours studied’ and ‘marks obtained’. Many students have been observed and their hours of study and grade are recorded. This will be our training data. The goal is to design a model that can predict marks if given the number of hours studied. Using the training data, a regression line is obtained which will give a minimum error. This linear equation is then used for any new data. That is, if we give a number of hours studied by a student as an input, our model should predict their mark with minimum error.
Next let’s learn how to learn or estimate Model Coefficients.
• β1 is the coefficient of x
• β0 is the intercept
The coefficients are estimated using the least-squares criterion, i.e., the best fit line has to be calculated that minimizes the sum of squared residuals (or "sum of squared errors").
Let’s understand the intuition

#### The mathematics involved

Have a quick look at the plot. Now consider each point, and know that each of them has a coordinate in the form (X, Y). Now draw an imaginary line between each point and the current "best-fit" line. We'll call the distance between each point and the current best-fit line as E. To get a quick image of what we're trying to visualize, take a look at the picture below
Let’s understand what elements are present in the diagram represents?
• The red points are the observed values of x and y.
• The blue line is the least squares line.
• The green lines are the residuals, which is the distance between the observed values and the least squares line.
So before, we're labelling each green line as having a distance E, and each red point as having a coordinate of (X, Y). Then we can define our best fit line as the lines having the property were:
·         𝐷21+𝐷22+𝐷23+𝐷24+....+𝐷2𝑁
So how do we find this line?
The least-square line approximating the set of points:
·         (𝑋,𝑌)1,(𝑋,𝑌)2,(𝑋,𝑌)3,(𝑋,𝑌)4,(𝑋,𝑌)5,(X,Y)1,(X,Y)2,(X,Y)3,(X,Y)4,(X,Y)5,
has the equation:
·         Y=b0+b1X
this is basically just a similar representaion of the standard equation for a line:
·         Y = mx + c
So how to calculate the model coefficients?
The values b0 and b1 must be chosen so that they minimize the error. If the sum of squared error is taken as a metric to evaluate the model, then the goal to obtain a line that best reduces the error. The error formulae are given as:

NOTE: If we don’t square the error, then positive and the negative point will cancel out each other.
For model with one independent variable (say x),
Some of the assumptions to consider whenever we are dealing with regression task: -
• The regression model is linear in terms of coefficients and error term.
• The mean of the residuals is zero.
• The error terms are not correlated with each other, i.e. given an error value; we cannot predict the next error value.
• The independent variables X are not dependent on the dependent variable (Y) is known as Exogeneity. This, in layman term, generalizes that in no way should the error term be predicted given the value of independent variables.
• The error terms have a constant variance, i.e. homoscedasticity.
• No Multicollinearity, i.e. no independent variables should be correlated with each other or affect one another. If there is multicollinearity, the precision of prediction by the OLS model decreases.
• The error terms are normally distributed.
The general equation of a straight line is:
Y = mx + c
It means that if we have the value of m and c, we can predict all the values of y for corresponding x. During construction of a Linear Regression Model, the computer tries to calculate the values of m and c to get a straight line.
But the question arises:

#### How do we know this is the best fit line?

The best fit line is obtained by minimizing the error/residual.
Residual is the distance between the actual Y and the predicted Y, as shown below:
Figure: Residual
Mathematically, Residual is:
r = actual y – predicted y
Hence, the sum of the square of residuals can be written as:
NOTE: In the above figure c is written as b.
As we can see in the figure above that the residual is a function of both m and b, so differentiating partially with respect to m and b will give us:
For getting the best fit line, error/residual should be minimum. The minima of a function occur where the derivative=0. So, equating our corresponding derivatives to 0, we get:
Ideally, if we'd have an equation of one dependent and one independent variable the minima will look as follows: