Hinge Loss and Square Hinge loss

Neha Kumawat

9 months ago

Hinge loss is another type of loss function which is an alternative of cross-entropy for binary classification problems.
This loss function is primarily developed for use with Support Vector Machine (SVMs) models.
It is used with binary classification where the target values are in the set {-1, 1}.
The hinge loss function supports examples to have the correct sign. It assigns more error when there is a difference present in the sign of the actual and predicted values.
The performance of the models with hinge loss function is of mixed type, sometimes its performance is better than cross-entropy or sometimes its not, on binary classification problems.
Let’s take an example and try to understand about hinge loss in a better way.
One of the common problems of the hinge is that it can be used in several ways. The is a pure math hinge loss but there’s a version of hinge loss that can be used in classification problems. In machine learning term, the hinge loss is seen to be used in the case of SVM only.
Let take an example as shown in the table below:
Let’s assume margin = 0.22
example as shown in the table
example as shown in the table
In the above table, we have taken it as a hypothetical SVM for demonstration purposes. The aim is to perform binary classification. Items can be class -1 or +1 (for example, boy / girl, or spam / not spam, etc.).
In an SVM classifier, it accepts predictor values and predicts a value between -1 and +1. For example, +0.3873 or -0.4548.
If the computed output value is any positive value then the model prediction is class +1 and if the computed output is any negative value, the prediction will be class -1. But in the SVM model, margin plays an important role.
Now let’s take the margin value equal to 0.22 and the actual and computed values are shown in the above table.
Let’s see what actually happens due to margin.
For case 0, the actual value is +1 and the computed value is 0.560 so this is a correct prediction because the computed value is greater than the margin of 0.22 so there is no hinge loss.
Similarly, for case 1, the actual value is +1 and the computed value is 0.270 so this is a correct prediction because the computed value is greater than the margin of 0.22 so there is no hinge loss.
For case 2, the actual value is +1 and the computed value is +0.150 so the classification is correct but the computed value is less than the margin of 0.22 so there’s a small hinge loss even though the classification is correct.
For case 3, the actual value is +1 and the computed value is -0.240 so the classification is wrong and there’s a large hinge loss presents here.
For case 4, the actual value is -1 and the computed value is -0.360 so the classification for this is correct and there will be no hinge loss because the computed is far away for the margin 0.22.
For case 5, the actual is -1 and the computed value is -0.970 so this situation is the same as the above case 4 and so no hinge loss is present here.
For case 6, the actual value is -1 and the computed value is -0.05 so in this case, the classification is correct but there is a moderate hinge loss present because the computed value is very close to zero.
For case 7, the actual value is -1 and the computed value is +0.250 so the classification is wrong in this case and there’s a large hinge loss present here.
So, let me summarize it for you,
  • when we are working with an SVM classifier model, if a computed value predicts a correct classification and is larger than the margin then there will be no hinge loss.
  • If a computed value gives a correct classification but is too close to zero (where too close is defined by a margin) there is a small hinge loss.
  • And, if a computed value gives an incorrect classification there will always be a hinge loss with quite high value.

Implementation of Hinge loss using Python

Let’s build a small Multilayer Perceptron (MLP) and use hinge loss as a loss function.
# Multi-layer perceptron for the circles problem with hinge loss
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from numpy import where
# generate 2d classification dataset
X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)
# change y from {0,1} to {-1,1}
y[where(y == 0)] = -1
# split into train and test
n_train = 500
testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu',
kernel_initializer='he_uniform'))
model.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='hinge', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()
Output:
The above code will first print the training and testing loss and it will plot line plots of Hinge Loss and Classification Accuracy over Training Epochs on the two circles binary classification problem.
Train: 0.791 Test: 0.738
Line plot training and testing loss as shown below:
Line plots with hinge loss and classification accuracy over training epochs
Line plots with hinge loss and classification accuracy over training epochs

Square Hinge loss

There are many extensions of hinge loss are present to use with SVM models.
One of the popular extensions is called Squared Hinge Loss. It simply calculates the square of the hinge loss value.
Squared hinge loss has the effect of the smoothing the surface of the error function and making it numerically easier to work with.
When the hinge loss requires better performance on a given binary classification problem it is mostly observed that a squared hinge loss may be appropriate to use. As using the hinge loss function, the target variable must be modified to have values in the set {-1, 1}.
Its pretty simple to implement using python only we have to change the loss function name to “squared_hinge” in compile () function when building the model.
Let’s see its python implementation on a simple multi perceptron layer model.

Implementation of Squared Hinge loss using Python. 

# multi-layer perceptron for the circles problem with squared hinge loss
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from numpy import where
# generate 2d classification dataset
X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)
# change y from {0,1} to {-1,1}
y[where(y == 0)] = -1
# split data into train and test
n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu',
kernel_initializer='he_uniform'))
model.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='squared_hinge', optimizer=opt,
metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)

_, test_acc = model.evaluate(testX, testy, verbose=0)

print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()
Output:
The above code will first print the training and testing loss and it will plot line
plots of squared Hinge Loss and Classification Accuracy over Training Epochs on the two circles binary classification problem.
Train: 0.685 Test: 0.643
Line plot training and testing loss as shown below:
Line plots with squared hinge loss and classification accuracy over training epochs
Line plots with squared hinge loss and classification accuracy over training epochs
After reading this article, finally, you came to know the importance of Hinge loss and Squared Hinge loss. For more blogs/courses in data science, machine learning, artificial intelligence, and new technologies do visit us at InsideAIML.
Thanks for reading…

Submit Review