#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Introduction To Python

about installing the softwere Exception Type: JSONDecodeError at /update/ Exception Value: Expecting value: line 1 column 1 (char 0) How to leave/exit/deactivate a Python virtualenvironment What is the difference between covariance and correlation? TypeError: 'int' object is not subscriptable Which are different modes to open a file ? How to handle imbalanced data and achieve good performance? What does it mean to cross-validate a machine learning model? Join Discussion

4 (4,001 Ratings)

218 Learners

May 19th (7:00 PM) 422 Registered

Kajal Pawar

5 months ago

- Why to use Mean Absolute error?

- A Real example, when we can use Mean Absolute Error?

- Implementation of MAE using Python

- Output:

- Comparison between MAE and RMSE

- Conclusion

Mean Absolute Error Loss is the average of the absolute difference between the actual and predicted values.

In a simple way, we can say as

Mean Absolute Error is one of the types of model evaluation metric used for a regression problem. It’s the average over the test sample of the absolute differences between prediction and actual
observation where all individual differences have equal weight. Mean Absolute
Error Loss is the average of the absolute difference between the actual and
predicted values.

The equation for mean squared error can be
given as:

The range of the MAE can lie between 0 to
infinity.

MAE is also known as **L1 loss** and can be
given as:

Let’s take an example and how we can calculate
MAE

From the given table we can see that the table consists
of different numbers of rooms say **2,3,4** and **5** and their
respective actual and predicted cost are given. So, we have to know calculate
the mean absolute error of the given table.

Let’s see how we can do it.

First let see What exactly does **‘ERROR’** in this
metric mean?

Basically, error can be simply calculated as:

For the given example, our error for each prediction can
be calculated as below;

Actual Price = $200K

Predicted Price = $230K

Error = Actual Price — Predicted Price

Absolute Error 1 = |Error| (Absolute or positive value of our error)

Absolute Error 1 = $30K

Actual Price = $300K

Predicted Price = $290K

Error = Actual Price — Predicted Price

Absolute Error 2= |Error| (Absolute or positive value of
our error)

Absolute Error 2 = $10K

Actual Price = $400K

Predicted Price = $740K

Error => Actual Price — Predicted Price

Absolute Error 3= |Error| (Absolute or positive value of
our error)

Absolute Error 3 = $340K

Actual Price = $500K

Predicted Price = $450K

Error => Actual Price — Predicted Price

Absolute Error 4= |Error| (Absolute or positive value of
our error)

Absolute Error 4 = $50K

Here, in this example **n **be the total
number of training set, which is

MAE = (Absolute Error 1 +
Absolute Error 2 + Absolute Error 3 + Absolute Error 4) / n

From the
above example we can say that, averagely, our model predictions are off by
approximately **$107.5K.**

MAE can be used on some of the regression
problems where the distribution of the target variable follows Gaussian
distribution but may have some outliers in it. So, in this case MAE can se used
as it is more robust to outliers.

One of the disadvantages of MAE is that it may
causes **convergence problems**.

In this case, the gradient of the magnitude is
not dependent on the error size but it depends only on the sign of y - ŷ which make the gradient magnitude be large
even when the error is quite small.

We
can use Mean absolute error when we are dealing with any regression problem
i.e., where we want to predict some continues value and don’t want outliers to
play a big role in it.

It
can also be use when we have our distribution as a multimodal distribution and it’s desirable to have
predictions at one of the modes, rather than at the mean of them.

Example: When doing image
reconstruction, MAE support less blurry images compared to MSE. This example is
taken from the paper Image-to-Image Translation with Conditional Adversarial
Networks by Isola et al.

```
# mlp for regression with mae loss function
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)
# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]
# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_absolute_error', optimizer=opt, metrics=['mse'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
# evaluate the model
_, train_mse = model.evaluate(trainX, trainy, verbose=0)
_, test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot mse during training
pyplot.subplot(212)
pyplot.title('Mean Squared Error')
pyplot.plot(history.history['mean_squared_error'], label='train')
pyplot.plot(history.history['val_mean_squared_error'], label='test')
pyplot.legend()
pyplot.show()
```

The above code first output** **the **mean absolute error** for the model on
the train and test datasets as

`Train: 0.003, Test: 0.004`

Then it will plot training and testing
loss as shown below:

In this example we plotted a line plot
which shows the mean absolute error loss over the training epochs for both the
train (blue) and test (orange) sets (top), and a similar plot for the mean
squared error (bottom) is also plotted.

From the below graph, we can observe
that MAE does converge but shows a bumpy course, although the dynamics of MSE
don’t appear greatly affected.

As we know that the target variable
follows a standard Gaussian distribution with no large outliers, so here MAE
would not be a good fit.

It might be more appropriate for this
case if we did not scale the target variable first.

There
are many different metrics available with us to measure the model performance.
One of these is RMSE.

Let’s
first try to what is the similarities between MAE AND RMSE?

Similarities
between MAE and RMSE:

- MAE and RMSE both express average model prediction error of the given model.

- In both the metrics, range lies from 0 to ∞ and are indifferent to the direction of errors.

- Both the metrics are negatively-oriented scores, which means lower values are better.

Taking the
square root of the average squared errors has some interesting implications for
RMSE. Since the errors are squared before they are averaged, the RMSE gives a
relatively high weightage to large errors. Which implies that the RMSE
should be more useful when there are large errors which are particularly
undesirable.

Let’s take an example and
see how MAE is steady and RMSE increases as the variance associated with the
frequency distribution of error magnitudes also increases. Let me draw a table
and show how it’s actually works.

From the above table we can see how MAE is steady and RMSE increases due to the
variance present in the frequency distribution of error magnitudes also
increases.

To explain the above statement, let us
consider case 4 and case 5. The table is show below:

In Case 4, we are having an equal number of
test errors which is of 0 and 5. And in Case 5 we are having an equal number of
test errors which is 3 and 4.

So, considering these two cases we can see
that the variance of the errors in Case 4 is greater than the variance present
in Case 5. But the RMSE in both the cases are same. That’s why **RMSE does not necessarily
increase with the variance of the errors. RMSE increases with the variance of
the frequency distribution of error magnitudes.**

In
this article, we try to get a deeper understanding of what is Mean Absolute
Error (MAE) is and how and when we can apply and use them. We also try to cover
little bit mathematics behind the scene and also try to compare MAE and RMSE.

After
reading this article, finally you came to know the importance of **Mean absolute
error (MAE)**. For more blogs/courses in data science, machine learning,
artificial intelligence and new technologies do visit us at InsideAIML.

Thanks
for reading…