All Courses

Download our e-book of Introduction To Python

Master's In Artificial Intelligence Job Guarantee Program

4.5 (1,292 Ratings)

559 Learners

How To Land a Job in Data Science?

Feb 9th (7:00 PM) 185 Registered
More webinars

Mean Absolute Error Loss

Kajal Pawar

a year ago

Table of Content
• Why to use Mean Absolute error?
• A Real example, when we can use Mean Absolute Error?
• Implementation of MAE using Python
• Output:
• Comparison between MAE and RMSE
• Conclusion
Mean Absolute Error Loss is the average of the absolute difference between the actual and predicted values.
In a simple way, we can say as
Mean Absolute Error is one of the types of model evaluation metric used for a regression problem. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight. Mean Absolute Error Loss is the average of the absolute difference between the actual and predicted values.
The equation for mean squared error can be given as:
The range of the MAE can lie between 0 to infinity.
MAE is also known as L1 loss and can be given as:
Let’s take an example and how we can calculate MAE
From the given table we can see that the table consists of different numbers of rooms say 2,3,4 and 5 and their respective actual and predicted cost are given. So, we have to know calculate the mean absolute error of the given table.
Let’s see how we can do it.
First let see What exactly does ‘ERROR’ in this metric mean?
Basically, error can be simply calculated as:
Error = Actual Value - Predicted Value
For the given example, our error for each prediction can be calculated as below;
For 2-bedroom house
Actual Price = \$200K
Predicted Price = \$230K
Error = Actual Price — Predicted Price
Absolute Error 1 = |Error| (Absolute or positive value of our error)
Absolute Error 1 = \$30K
For 3-bedroom house
Actual Price = \$300K
Predicted Price = \$290K
Error = Actual Price — Predicted Price
Absolute Error 2= |Error| (Absolute or positive value of our error)
Absolute Error 2 = \$10K
For 4-bedroom house
Actual Price = \$400K
Predicted Price = \$740K
Error => Actual Price — Predicted Price
Absolute Error 3= |Error| (Absolute or positive value of our error)
Absolute Error 3 = \$340K
5-bedroom house
Actual Price = \$500K
Predicted Price = \$450K
Error => Actual Price — Predicted Price
Absolute Error 4= |Error| (Absolute or positive value of our error)
Absolute Error 4 = \$50K
Here, in this example be the total number of training set, which is
n = 4
MAE can be calculated as:
MAE = (Absolute Error 1 + Absolute Error 2 + Absolute Error 3 + Absolute Error 4) / n
MAE = (\$30K + \$10K + \$340K + \$50K)/4
MAE = \$107.5K
From the above example we can say that, averagely, our model predictions are off by approximately \$107.5K.
Why to use Mean Absolute error?
MAE can be used on some of the regression problems where the distribution of the target variable follows Gaussian distribution but may have some outliers in it. So, in this case MAE can se used as it is more robust to outliers.
One of the disadvantages of MAE is that it may causes convergence problems.
In this case, the gradient of the magnitude is not dependent on the error size but it depends only on the sign of y - ŷ which make the gradient magnitude be large even when the error is quite small.

A Real example, when we can use Mean Absolute Error?

We can use Mean absolute error when we are dealing with any regression problem i.e., where we want to predict some continues value and don’t want outliers to play a big role in it.
It can also be use when we have our distribution as a multimodal distribution and it’s desirable to have predictions at one of the modes, rather than at the mean of them.
Example: When doing image reconstruction, MAE support less blurry images compared to MSE. This example is taken from the paper Image-to-Image Translation with Conditional Adversarial Networks by Isola et al.

Implementation of MAE using Python

``````# mlp for regression with mae loss function
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot

# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)
# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]

# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_absolute_error', optimizer=opt, metrics=['mse'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

# evaluate the model
_, train_mse = model.evaluate(trainX, trainy, verbose=0)
_, test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot mse during training
pyplot.subplot(212)
pyplot.title('Mean Squared Error')
pyplot.plot(history.history['mean_squared_error'], label='train')
pyplot.plot(history.history['val_mean_squared_error'], label='test')
pyplot.legend()
pyplot.show()
``````

Output:

The above code first output the mean absolute error for the model on the train and test datasets as
``Train: 0.003, Test: 0.004``
Then it will plot training and testing loss as shown below:
In this example we plotted a line plot which shows the mean absolute error loss over the training epochs for both the train (blue) and test (orange) sets (top), and a similar plot for the mean squared error (bottom) is also plotted.
From the below graph, we can observe that MAE does converge but shows a bumpy course, although the dynamics of MSE don’t appear greatly affected.
As we know that the target variable follows a standard Gaussian distribution with no large outliers, so here MAE would not be a good fit.
It might be more appropriate for this case if we did not scale the target variable first.

Comparison between MAE and RMSE

There are many different metrics available with us to measure the model performance. One of these is RMSE.
Let’s first try to what is the similarities between MAE AND RMSE?
Similarities between MAE and RMSE:
• MAE and RMSE both express average model prediction error of the given model.
• In both the metrics, range lies from 0 to ∞ and are indifferent to the direction of errors.
•  Both the metrics are negatively-oriented scores, which means lower values are better.
Differences between MAE and RMSE:
Taking the square root of the average squared errors has some interesting implications for RMSE. Since the errors are squared before they are averaged, the RMSE gives a relatively high weightage to large errors. Which implies that the RMSE should be more useful when there are large errors which are particularly undesirable.
Let’s take an example and see how MAE is steady and RMSE increases as the variance associated with the frequency distribution of error magnitudes also increases. Let me draw a table and show how it’s actually works.
From the above table we can see how MAE is steady and RMSE increases due to the variance present in the frequency distribution of error magnitudes also increases.
NOTE: RMSE does not necessarily increase with the variance of the errors. RMSE increases with the variance of the frequency distribution of error magnitudes.
To explain the above statement, let us consider case 4 and case 5. The table is show below:
In Case 4, we are having an equal number of test errors which is of 0 and 5. And in Case 5 we are having an equal number of test errors which is 3 and 4.
So, considering these two cases we can see that the variance of the errors in Case 4 is greater than the variance present in Case 5. But the RMSE in both the cases are same. That’s why RMSE does not necessarily increase with the variance of the errors. RMSE increases with the variance of the frequency distribution of error magnitudes.

Conclusion

In this article, we try to get a deeper understanding of what is Mean Absolute Error (MAE) is and how and when we can apply and use them. We also try to cover little bit mathematics behind the scene and also try to compare MAE and RMSE.
After reading this article, finally you came to know the importance of Mean absolute error (MAE). For more blogs/courses in data science, machine learning, artificial intelligence and new technologies do visit us at InsideAIML.
Thanks for reading…