#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Attention Mechanism

4 (4,001 Ratings)

220 Learners

Jun 20th (6:00 PM) 692 Registered

Ashish Katri

10 months ago

Linear Regression in Machine Learning

Let us first understand

Regression is a statistical technique that shows an algebraic relationship
between two or more variables.

Based on this algebric relationship (rather than a function), one can
estimate the value of a variable, given the values of the other variables.

Usually, correlation is used to check whether there is any relationship
between the two variables. If any relationship found, regression is used to
find the degree of relationships that can be then used for prediction.

Some of the examples are:

1)
Predict rainfall in cm for month.

2)
Predict stock price for next day.

Now as you got an idea about what is regression? Let’s move forward and
see what are the types of regressions?

- Linear regression
- Logistic regression
- Polynomial regression
- Stepwise regression
- Ridge regression
- Lasso regression
- ElasticNet regression

In this article I will explain you
about Linear Regression and later I will try to take you through the other
types of regressions.

Graph For Linear Regression

Linear
regression performs the task to predict a dependent variable value (y) based on
a given independent variable (x). So, this regression technique finds out a
linear relationship between x (input) and y (output). Hence, the name is Linear
Regression.

In the figure above, X (input) is the work experience and Y (output) is the salary of a person. The regression line is the best fit line for our model.

In the figure above, X (input) is the work experience and Y (output) is the salary of a person. The regression line is the best fit line for our model.

Linear
Regression may further divided into

1) **Simple Linear Regression/ Univariate Linear
regression**

2) ** Multivariate Linear Regression **

When we try to find out a
relationship between a dependent variable (Y) and one independent (X) then it
is known as **Simple Linear Regression/ Univariate
Linear regression.**

The
mathematical equation can be given as:

Where

- Y is the response or the target variable
- x is the independent feature
- β1 is the coefficient of x
- β0 is the intercept

β0 and β1 are
the **model coefficients**. To create a model, we must
"learn" the values of these coefficients. And once we have the value
of these coefficients, we can use the model to predict the Sales!

Let’s
suppose we have a dataset that contains information about the relationship between
‘a number of hours studied’ and ‘marks obtained’. Many students have been
observed and their hours of study and grade are recorded. This will be our
training data. The goal is to design a model that can predict marks if given the
number of hours studied. Using the training data, a regression line is obtained
which will give a minimum error. This linear equation is then used for any new
data. That is, if we give a number of hours studied by a student as an input, our
model should predict their mark with minimum error.

Next let’s learn how to learn or estimate** Model Coefficients.**

**β1**is the coefficient of x**β0**is the intercept

The coefficients are estimated using the **least-squares
criterion**, i.e., the best fit line has to be calculated that minimizes
the **sum of squared residuals** (or "sum of squared
errors").

Let’s understand the intuition

Have a quick look at the plot. Now consider each point, and know
that each of them has a coordinate in the form (X, Y). Now draw an imaginary
line between each point and the current "best-fit" line. We'll call
the distance between each point and the current best-fit line as E. To get a
quick image of what we're trying to visualize, take a look at the picture
below

Let’s understand what elements are present in the diagram
represents?

- The red points are the observed values of x and y.
- The blue line is the least squares line.
- The green lines are the residuals, which is the distance between the observed values and the least squares line.

So before, we're labelling each green line as
having a distance E, and each red point as having a coordinate of (X, Y). Then
we can define our best fit line as the lines having the property were:

·
𝐷21+𝐷22+𝐷23+𝐷24+....+𝐷2𝑁

So how do we find this line?

The least-square line approximating the set of
points:

·
(𝑋,𝑌)1,(𝑋,𝑌)2,(𝑋,𝑌)3,(𝑋,𝑌)4,(𝑋,𝑌)5,(X,Y)1,(X,Y)2,(X,Y)3,(X,Y)4,(X,Y)5,

has the equation:

·
**Y=b0+b1X**

this is basically just a similar representaion of the
standard equation for a line:

·
**Y = mx + c**

So how to calculate the model coefficients?

The values b0 and b1
must be chosen so that they minimize the error. If the sum of squared error is
taken as a metric to evaluate the model, then the goal to obtain a line that best
reduces the error. The error formulae are given as:

For model with one
independent variable (say x),

Some of the assumptions
to consider whenever we are dealing with regression task: -

- The regression model is linear in terms of coefficients and error term.
- The mean of the residuals is zero.
- The error terms are not correlated with each other, i.e. given an error value; we cannot predict the next error value.
- The independent variables X are not dependent on the dependent variable (Y) is known as
**Exogeneity**. This, in layman term, generalizes that in no way should the error term be predicted given the value of independent variables. - The error terms have a constant variance, i.e.
**homoscedasticity**. - No Multicollinearity, i.e. no independent variables should be correlated with each other or affect one another. If there is multicollinearity, the precision of prediction by the OLS model decreases.
- The error terms are normally distributed.

The
general equation of a straight line is:

It means that if we have the value of m and c, we can predict
all the values of y for corresponding x. During construction of a Linear
Regression Model, the computer tries to calculate the values of m and c to get
a straight line.

But the question arises:

The best fit line is obtained by minimizing the **error/residual**.

Residual is the distance between the actual Y and the predicted Y,
as shown below:

Figure:
Residual

Mathematically, Residual is:

Hence, the sum of the square of residuals can be written
as:

As we can see in the figure above that the residual is a function
of both m and b, so differentiating partially with respect to m and b will give
us:

For getting the best fit line, error/residual should be **minimum.** The minima of a function
occur where the derivative=0. So, equating our corresponding derivatives to 0,
we get:

Ideally, if we'd have an equation of one dependent and one independent
variable the minima will look as follows:

Learn more about Linear Regression InsideAIML.