Download our e-book of Introduction To Python

Matplotlib - Subplot2grid() FunctionDiscuss Microsoft Cognitive ToolkitMatplotlib - Working with ImagesMatplotlib - PyLab moduleMatplotlib - Working With TextMatplotlib - Setting Ticks and Tick LabelsCNTK - Creating First Neural NetworkMatplotlib - MultiplotsMatplotlib - Quiver PlotPython - Chunks and Chinks View More

How can I write Python code to change a date string from "mm/dd/yy hh: mm" format to "YYYY-MM-DD HH: mm" format? Which sorting technique is used by sort() and sorted() functions of python? How to use Enum in python? Can you please help me with this error? I was just selecting some random columns from the diabetes dataset of sklearn. Decision tree is a classification algo...How can it be applied to load diabetes dataset which has DV continuous Objects in Python are mutable or immutable? How can unclassified data in a dataset be effectively managed when utilizing a decision tree-based classification model in Python? How to leave/exit/deactivate a Python virtualenvironment Join Discussion

Kajal Pawar

2 years ago

- According to Wikipedia definition
- Mean squared error (MSE)
- When to use mean squared error

In statistics,
the **mean squared error** (**MSE**) or **mean squared deviation** (**MSD**) of an estimator
(of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that
is, the average squared difference between the estimated values and the actual
value. MSE is a risk function,
corresponding to the expected value of the squared error loss. The fact that MSE is almost
always strictly positive (and not zero) is because of randomness
or because the estimator does not account for
information that could produce a
more accurate estimate.

Now let me give you a simple definition

Mean
squared error (MSE) is one of the most commonly used loss functions for
regression problems.

Let’s first try
to understand actually what this equation means.

- The character that looks like E is called summation and in Greek known as sigma. It is the sum of a sequence of numbers, form i=1 to i=n.

- Here, y represents the actual values and y’ represent the predicted values. When we subtract the y-y’ and then square them and take the sum of all the (y-y’)².

- Then we divide this (y-y’)² value with n where n is the number of data points to get the mean, which is known as mean-squared-error ( MSE ).

Let’s take an
example an see why we actually need mead squared error.

I will take an example and I will draw a regression line
between the different data points. Don’t consider it as the best fit regression
line, I am only taking it as example show how it actually works.

Now you might be
thinking why I am plotting this graph. Let’s me explain to you.

Here I have taken
10 data points randomly and plotted them a graph

- The Blue points are our data points which will be having x and y coordinates. When we plot them on a graph they will look as shown above.

- The line passing through all the data points is called prediction line or Regression line. There may be different numbers of prediction line but the line which best fitted all the data points are called best fit Regression line.

- The vertical line between the data points and prediction line is called errors. It is also known as residuals.

Now as most of us
may already be familiar about the equation of a straight line from our school
days.

Where,

- m describes the slope of the line and
- b is the y-intercept which describes where the line crosses the y-axis.

To get the **best-fit regression line** we want to
minimize the error value.

Now let me give
you the mathematical aspects behind equation **Mean squared error (MSE).**

As you know, the straight-line equation is **y=mx+b, **where m is the slope and b are the y-intercept of the straight line.

So, we can get the MSE equation of different data points
as follows:

We can simplify the above equation and write it as:

Now, let’s open
all the bracket of the above equation and write it in a simpler way as shown
below.

Now, let’s perform some other manipulation to simplify it
more. Taking each part and put it together. We will take all the y, and (-2ymx)
and etc., and we will put them all side-by-side which will help us to simplify
it more as shown.

Now at this
point we’re getting messy, so will take the mean of all squared values for y,
xy, x, x² respectively.

We will take a
new character for each one which will represent the mean of all the squared
values.

So, to take the
mean we will take all the y values, and divide them by n and call it **y **as shown below.

Multiplying both sides of the equation by n we get the
equation as:

Finally, we will get following equation as shown below:

We can see from the
above equation that we are having m and b as the coefficients of the equation.

Now our aim is
to find value of m and b which minimizes the function.

So, how to find
it?

We will take a
partial derivative with respect to m and a partial derivative with respect to
b. Since we are trying to find a minimum value. So, we will take the partial
derivatives and compare this value with 0 as shown below

Taking the two equations what we received above, isolating
the variable b from both, and then subtracting the upper equation from the
bottom equation as shown below.

Now subtracting the first equation from the second
equation we get

Now, let me for simplified these equations for you do that
you may not be wondering that what each element represents here.

Sum of x divided by n

Sum of xy divided by n

As of now, you
may feel quite comfortable with the equation and concepts of the MSE.

So, to make it
clearer and give you a deeper understanding. Let’s take an example.

Let’s take 3
points on (1,2), (2,1), (4,3) and plot them on a graph. The points will look
like as shown below

Let’s try to
find the value of **slope m** and **intercept b** for the equation **y=mx+b.**

We can find the
value of **Sum the x values and divide by n** as shown below.

We can find the
value of **Sum the y values and divide by n** as shown below.

We can find the
value of **Sum the xy values and divide by n** as shown below.

We can find the
value of **Sum the x² values and divide by n** as shown below.

As of now, we have calculated the different values of line
equation, let’s put them together and calculate the value of slope m and
intercept y.

After we’ve calculated the relevant parts for our M
equation and B equation, let’s put those values inside the equations and get
the slope and y-intercept.

Let’s put all the above calculated value into the line equation
y=mx+b.

So, this is the equation line
which will give us the **best-fit
regression line.**

Let’s draw the line using the above equation and see how
the line passes through the lines in such a way that it minimizes the squared
distances and provides us a best-fit regression line.

We can use **MSE** when you are dealing with any type’s regression problems,
believing that our target, depends on the input, which is normally distributed,
and want large errors to be significantly more penalized than small ones.

```
# mlp for regression with mse loss function
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)
# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]
# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_error', optimizer=opt)
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))
# plot loss during training
pyplot.title('Loss / Mean Squared Error')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()
```

The above code first output** **the mean squared error for the model on the train and test
datasets as

Then it will plot training
and testing loss as shown below:

After reading this article,
finally you came to know the importance of **Mean
squared error (MSE)**.

Enjoyed reading this blog? Then why not share it with others. Help us make this AI community stronger.

To learn more about such concepts related to Artificial Intelligence, visit our insideAIML blog page.

You can also ask direct queries related to Artificial Intelligence, Deep Learning, Data Science and Machine Learning on our live insideAIML discussion forum.

Keep Learning. Keep Growing.