#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Introduction To Python

about installing the softwere Exception Type: JSONDecodeError at /update/ Exception Value: Expecting value: line 1 column 1 (char 0) How to leave/exit/deactivate a Python virtualenvironment What is the difference between covariance and correlation? TypeError: 'int' object is not subscriptable Which are different modes to open a file ? How to handle imbalanced data and achieve good performance? What does it mean to cross-validate a machine learning model? Join Discussion

4 (4,001 Ratings)

218 Learners

Shashank Shanu

2 years ago

Activation functions are a very important component of neural networks in deep learning. It helps us to determine the output of a deep learning model, its accuracy, and also the computational efficiency of training a model. They also have a major effect on how the neural networks will converge and what will be the convergence speed. In some cases, the activation functions might also prevent neural networks from convergence. So, let’s understand the activation functions, types of activation functions & their importance and limitations in details.

Activation functions help us to determine the output of a neural network. These types of functions are attached to each neuron in the neural network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction. Activation function also helps us to normalize the output of each neuron to a range between** 1** and **0 **or between **-1** and **1. **As we know, sometimes the neural network is trained on millions of data points, So the activation function must be efficient enough that it should be capable of reducing the computation time and improve performance.

In a neural network, inputs are fed into the neuron in the input layer. Where each neuron has a weight and multiplying the input number with the weight of each neuron gives the output of the neurons, which is then transferred to the next layer and this process continues. The output can be represented as-

Note: The range of Y can be in between -infinity to +infinity. So, to bring the output into our desired prediction or generalized results we have to pass this value from an activation function. The activation function is a type of mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output **on** and **off**, depending on a rule or threshold what is provided. The final output can be represented as shown below:

Neural networks use non-linear activation functions, which can help the network to learn complex data, compute and learn almost any function representing a question, and provide accurate predictions.

The core idea behind applying any activation functions is to bring non-linearity into our deep learning models. Non-linear functions are those which have a degree more than one, and they have a curvature when we plot them as shown below.

We need to apply an activation function f(x) so as to make the network more powerful, add the ability to it to learn some data more complex and complicated in form, represent non-linear complex arbitrary functional mappings between inputs and outputs. Hence using a non-linear activation, we are able to generate non-linear mappings from inputs to outputs. One of another important feature of an activation function is that it should be differentiable. We need it to be differentiable because while performing backpropagation optimization strategy while propagating backwards in the network to compute gradients of error (loss) with respect to weights and, therefore, optimize weights using gradient descent or any other optimization techniques to reduce the error.

Below mentioned are some of the different type’s activation functions used in deep learning.

This is one of the most basic activation functions available to use and most of the time it comes to our mind whenever we try to bound output. It is basically a threshold base activation function, here we fix some threshold value to decide whether that the neuron should be activated or deactivated. Mathematically it can be represented as:

And it can be represented in the graph as shown below.

In the above figure, we decided the threshold value to be 0 as shown. Binary Activation function is very simple and useful to use when we want classify binary problems or classifier. One of the problems with binary step function is that it does not allow multi-value outputs - for example, it does not support classifying the inputs into one of several categories.

The linear activation function is a simple straight-line activation function where the function is directly proportional to the weighted sum of inputs or neurons. A linear activation function will be in the form as:

It can be represented in a graph as:

This activation function takes the inputs, multiply it by the weights of each neuron and produces the outputs proportional to the input. Linear activations function is better than a step function because it allows us for multiple outputs instead of only yes or no. Some of the major problems with Linear Activation problem are as follows:

In modern neural network models, it uses non-linear activation functions as the complexity of the model increases. This nonlinear activation function allows the model to create complex mappings between the inputs and outputs of the neural network, which are essential for learning and modelling complex data, such as images, video, audio, and data sets which are non-linear or have very high dimensionality. With the help of Non-linear functions, we are able to deal with the problems of a linear activation function is:

The Sigmoid activation function is one of the most widely used activation function. This function is mostly used as it performs its task with great efficiency. It is basically a probabilistic approach towards decision making and its value ranges between** 0** and **1. **When we plot this function it is plotted as ‘S’ shaped graph as shown.

If we have to make a decision or to predict an output, we use this activation function because its range is minimum which helps for accurate prediction. The equation for the sigmoid function can be given as:

Most common issues with the sigmoid function are that it causes a problem mainly in termed of vanishing gradient which occurs because here we converted large input in between the range of 0 to 1 and therefore their derivatives become much smaller which does not give satisfactory output. Another problem with this activation function is that it is Computationally expensive. To solve the problem Sigmoid Activation another activation function such as **ReLU **is used where we do not have a problem of small derivatives.

ReLU or Rectified Linear Unit is one of the most widely used activation functions nowadays. It ranges between **0 to Infinity. **It is mostly applied in the hidden layers of Neural network. All the negative values are converted into zero. It produces an output x if x is positive and 0 otherwise. Equation of this function is:

The graph of this function is as follows:

The Dying ReLU problem: When inputs approach zero or are negative, the gradient of the function becomes zero so the network cannot perform backpropagation and cannot learn properly. This problem is known as The Dying ReLU problem.So, to avoid this problem we use Leaky ReLU activation function instead of ReLU. In Leaky ReLU its range is expanded which helps us to enhances the performance of the model.

We needed the Leaky ReLU activation function to solve the ‘**Dying ReLU**’ problem, as discussed in ReLU. We observe that all the negative input values turn into zero very quickly and in the case of Leaky ReLU we do not make all negative inputs to zero but instead we make a value near to zero which solves the major problem of ReLU activation function and helps us in increasing model performance.

In most of the cases, Tanh activation function always works better than the sigmoid function. Tanh stands for **Tangent Hyperbolic function**. It’s actually a modified version of the sigmoid function. Both of them can be derived from each other. Its values lie between **-1** and **1**. The equation of the tanh activation function is given as:

The graph of tanh can be shown as:

The Softmax Activation function is also a type of sigmoid function but is quite useful when we are dealing with classification problems. This function is usually used when trying to handle multiple classes. It would squeeze the outputs for each the class between**0** and **1**and would also divide by the sum of the outputs. The softmax function is ideally used in the output layer of the classifier model where we are actually trying to attain the probabilities to define the class of each input.

Note: For **Binary classification** we can use both **sigmoid**, as well as the **softmax ****activation function **which is equally approachable. But when we are having multi-class classification problem, we generally use softmax and cross-entropy along with it. The equation of the Softmax Activation function is:

its graph can be represented as:

As you may get familiar with the most commonly used activation functions. Let me summarize them in one place and provide you with a reference as a cheat sheet which you may keep handy whenever you need any reference.

And the graph of different activation functions will look like:

After reading this article finally you came to know the importance of activation functions and its types in neural networks.

For more blogs/courses in data science, machine learning, artificial intelligence, and new technologies do visit us at InsideAIML.Thanks for reading…