#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Attention Mechanism

How to use Enum in python? What is a Bag-of-Words Model ? What is use of rank() function? What are local and global scope? Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training? How to Choose a classification algorithm for particular problem? What is TF/IDF vectorization? What is file hashing in python? Join Discussion

5 (4,001 Ratings)

220 Learners

Jun 20th (6:00 PM) 690 Registered

Neha Kumawat

9 months ago

Hinge
loss is another type of loss function which is an alternative of cross-entropy
for binary classification problems.

This
loss function is primarily developed for use with Support Vector Machine (SVMs)
models.

It
is used with binary classification where the target values are in the set {-1,
1}.

The
hinge loss function supports examples to have the correct sign. It assigns more
error when there is a difference present in the sign of the actual and
predicted values.

The
performance of the models with hinge loss function is of mixed type, sometimes
its performance is better than cross-entropy or sometimes its not, on binary
classification problems.

Let’s
take an example and try to understand about hinge loss in a better way.

One of the common problems of the hinge is that it
can be used in several ways. The is a pure math hinge loss but there’s a
version of hinge loss that can be used in classification problems. In machine
learning term, the hinge loss is seen to be used in the case of SVM only.

Let take an example as shown in the table
below:

Let’s assume **margin = 0.22**

example as shown in the table

In the above table, we have taken it as a hypothetical
SVM for demonstration purposes. The aim is to perform binary classification.
Items can be class -1 or +1 (for example, boy / girl, or spam / not spam,
etc.).

In an SVM classifier, it accepts predictor
values and predicts a value between -1 and +1. For example, +0.3873 or -0.4548.

If the computed output value is any positive
value then the model prediction is class +1 and if the computed output is any
negative value, the prediction will be class -1. But in the SVM model, margin plays
an important role.

Now let’s take the margin value equal to 0.22
and the actual and computed values are shown in the above table.

For **case 0**, the actual value is +1 and
the computed value is 0.560 so this is a correct prediction because the computed value is greater than the margin of 0.22 so there is no hinge loss.

Similarly, for **case 1**, the actual value
is +1 and the computed value is 0.270 so this is a correct prediction because
the computed value is greater than the margin of 0.22 so there is no hinge
loss.

For **case 2**, the actual value is +1 and
the computed value is +0.150 so the classification is correct but the computed
value is less than the margin of 0.22 so there’s a small hinge loss even though
the classification is correct.

For** case 3**, the actual value is +1 and
the computed value is -0.240 so the classification is wrong and there’s a large
hinge loss presents here.

For **case 4**, the actual value is -1 and
the computed value is -0.360 so the classification for this is correct and
there will be no hinge loss because the computed is far away for the margin 0.22.

For **case 5**, the actual is -1 and the computed value is -0.970 so this situation is the same as the above case 4 and
so no hinge loss is present here.

For **case 6**, the actual value is -1 and
the computed value is -0.05 so in this case, the classification is correct but
there is a moderate hinge loss present because the computed value is very close
to zero.

For **case 7**, the actual value is -1 and
the computed value is +0.250 so the classification is wrong in this case and
there’s a large hinge loss present here.

So, let me summarize it for you,

- when we are working with an SVM classifier model, if a computed value predicts a correct classification and is larger than the margin then there will be no hinge loss.

- If a computed value gives a correct classification but is too close to zero (where too close is defined by a margin) there is a small hinge loss.

- And, if a computed value gives an incorrect classification there will always be a hinge loss with quite high value.

Let’s build a small Multilayer Perceptron
(MLP) and use hinge loss as a loss function.

```
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from numpy import where
```

```
n_train = 500
testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
```

```
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu',
kernel_initializer='he_uniform'))
model.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='hinge', optimizer=opt, metrics=['accuracy'])
```

```
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
```

```
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
```

```
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()
```

The above code will first print the *training
*and testing loss and it will plot** **line plots
of Hinge Loss and Classification Accuracy over Training Epochs on the two
circles binary classification problem.

Line plot training and testing loss as shown below:

Line plots with hinge loss and classification accuracy over training epochs

There
are many extensions of hinge loss are present to use with SVM models.

One
of the popular extensions is called **Squared Hinge Loss**. It simply
calculates the square of the hinge loss value.

Squared
hinge loss has the effect of the smoothing the surface of the error function
and making it numerically easier to work with.

When the hinge
loss requires better performance on a given binary classification problem it is
mostly observed that a squared hinge loss may be appropriate to use. As using the hinge loss function, the target
variable must be modified to have values in the set {-1, 1}.

Its pretty simple
to implement using python only we have to change the loss function name to
“squared_hinge” in compile () function when building the model.

Let’s see its
python implementation on a simple multi perceptron layer model.

```
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from numpy import where
```

```
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
```

```
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu',
kernel_initializer='he_uniform'))
model.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='squared_hinge', optimizer=opt,
metrics=['accuracy'])
```

```
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
```

```
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
```

```
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()
```

The above code will first print the training
and testing loss and it will plot** **line

plots
of squared Hinge Loss and Classification Accuracy over Training Epochs on the
two circles binary classification problem.

Line plot training and testing loss as shown below:

Line plots with squared hinge loss and classification accuracy over training epochs

After reading this article, finally, you came to know the
importance of **Hinge loss **and **Squared Hinge loss. **For more
blogs/courses in data science, machine learning, artificial intelligence, and
new technologies do visit us at** InsideAIML.**

Thanks for reading…