#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Designed by IITian's, only for AI Learners.

New to InsideAIML? Create an account

Employer? Create an account

Download our e-book of Introduction To Python

Exception Type: JSONDecodeError at /update/ Exception Value: Expecting value: line 1 column 1 (char 0) After creating super user I can't login to django admin account How to find all the adverbs and their positions using the re module? ImportError: cannot import name 'ZCRMRestClient' from 'zcrmsdk' How to change dataframe column values importing from csv file? Which skills required for data analysis using python? What is file hashing in python? how to import american sign language dataset in google colab Join Discussion

4.5 (1,292 Ratings)

547 Learners

Dec 15th (7:00 PM) 277 Registered

Neha Kumawat

a year ago

- Introduction.

- What is Recurrent Neural Network?

- Recurrent Neural Networks

- Deeper Network
- Hidden Layer
- Single Recurrent Layer

- Recurrent Neuron?

- Problems with a standard Recurrent Neural network

- Vanishing Gradient Problem

In
my previous article’s or tutorial, we saw what are neural networks, Artificial
neural network (ANN) and Convolutional Neural Network (CNN) and how they
actually work. In this tutorial, we will try to learn about one of another type
of neural networks architecture known as **Recurrent Neural Networks** or
commonly called as **RNNs.**

Let me take an example, I gave a
sentence – **“learning amazing Deep is**”, did this make any sense to you? I
think not really – But read this sentence – “**Deep learning is amazing”.** I
think this made perfect sense to you! Did you notice? If we little jumble the
words in a given sentence, the sentence becomes incoherent to us. Well, can we
expect a neural network to make sense out of it? Not really! If the human brain
was confused about what it meant I am sure a neural network is going to have a
tough time to understand such sentences.

Do you ever imagine when you start
typing an email, Gmail suddenly predict next word for your sentence, while
typing a WhatsApp message if you typed the wrong word, suddenly it gets
autocorrected? Yes, you are thinking correct! Recurrent Neural Network and
others deep learning techniques are involved behind it.

There are multiple such tasks in
everyday life which gets completely disrupted when their sequence is disturbed.
For instance, language as we saw earlier- the sequence of words defines their
meaning, a time series data – where time defines the occurrence of events, the
data of a genome sequence- where every sequence has a different meaning. There
are multiple such cases wherein the sequence of information determines the
event itself. If we are trying to use such data for any reasonable output, we
need a network that has access to some prior knowledge about the data to completely
understand it. Recurrent neural networks thus come into play.

So,
your next question is,

Let’s
imagine, I want to make breakfast today.
Suppose I had a burger yesterday so maybe I should consider a bread
omelet today.

Essentially,
what we will have a network that is similar to the feedforward neural networks.
What you have learned before. But with the addition of “memory”.

You
may have noticed that in the application of feedforward neural network
applications. You have seen before only the current input matters.

For
example, in a problem of classifying an image of a dog. If this is an image of a
dog or not? But perhaps this is not a static dog maybe when the picture was
shot, the dog was moving.

From
a single image it may be difficult to determine if its indeed walking or maybe it's running. Maybe it’s a very energetic and talented dog that is standing
in a funny pose with two of his legs up in the air.

When
we try to observe the dog frame by frame, we remember what we saw before. So,
we know if the Dog is still or it's walking. We can also distinguish it from
walking to running. But **Can a machine do the same?**

This
is where Recurrent neural networks or RNNs comes into the picture.

Now
let’s look at the definition of the word Recurrent in Recurrent Neural
Networks. We can find it simply means occurring often or repeatedly.

So
why are these networks called Recurrent Neural networks, its simply because
in Recurrent neural networks, we perform the task for each element in the input
sequence.

Let
I take an example and show how it actually works.

Let’s
say we have a task to predict the next word in a sentence. Let’s try
accomplishing it using a Multi-layer perceptron (MLP). So, what happens in an
MLP. In the simplest form, we have an input layer, a hidden layer, and an output
layer. The input layer receives the input, the hidden layer activations functions
are applied and then we finally receive the output.

Let’s
have a deeper network, where multiple hidden layers are present. So here, the
input layer receives the input, the first hidden layer activations are applied
and then these activations are sent to the next hidden layer, and successive
activations through the layers to produce the output. Each hidden layer is
characterized by its own weights and biases.

Since
each hidden layer has its own weights and activations, they behave
independently. Now the objective is to identify the relationship between
successive inputs.

Here,
the weights and bias of these hidden layers are different. And hence each of
these layers behaves independently and cannot be combined together. To combine
these hidden layers together, we shall have the same weights and bias for these
hidden layers.

We
can now combine these layers together, that the weights and bias of all the
hidden layers are the same. All these hidden layers can be rolled in together in
a **single recurrent layer.**

So,
it’s like supplying the input to the hidden layer. At all the time steps
weights of the recurrent neuron would be the same since it’s a single neuron
now. So, **a recurrent neuron stores the state of a previous input and combines
with the current input thereby preserving some relationship of the current
input with the previous input.**

Let’s consider a
simple task at first. Let’s take a character level RNN where we have the word “Please”.
So, we provide the first 5 letters, i.e. **P**,** l**, **e**, **a**,
**s** and ask the network to predict the last letter, i.e. “**e.**” So
here the vocabulary of the task is just 5 letters **{P, l, e, a, s}.** But in real case scenarios involving natural
language processing tasks, the vocabularies include the words in the entire Wikipedia
database or all the words in a language. Here for simplicity, we have taken a minimal set of vocabulary.

Let’s
try to understand, how the network is used to predict the sixth letter in the
word “Please”. In the figure, we can see that the middle block is the RNN
block, applies something called a **recurrence formula** to the input
vector and also its previous state. In this case, the letter “P” has nothing
preceding it, let’s take the letter “l”. So, at the time the letter “l” is
supplied to the network, a recurrence formula is applied to the letter “l” and
the previous state which is the letter “P”. These are known as various time
steps of the input. So, if at time t, the input is “l”, at time t-1, the input
was “P”. The recurrence formula is applied to “l” and “P” both and we receive a
new state.

We can give the formula
for the current state as shown –

We
can see from the formula,** Ht** represents the new state, **ht-1 **represents
the previous state and **xt **represent the current input.

We
now have a state of the previous input instead of the input itself, because the
input neuron would have applied the transformations on our previous input. So,
each successive input is called a **time step.**

In this case we
have five inputs to be given to the network, during a recurrence formula, the
same function and the same weights are applied to the network at each time
step.

Considering the
simplest form of a recurrent neural network, let’s consider that the activation
function is **tanh**, the weight at the recurrent neuron is **Whh** and
the weight at the input neuron is **Wxh**. So, the equation for the state at
time t can be given as –

The Recurrent
neuron in this case is just taking the immediately previous state into consideration.
For longer sequences, the equation can involve multiple such states. Once the
final state is calculated we can go on to produce the output.

Now, once the
current state is calculated we can calculate the output state as-

That how the Recurrent
Neural networks works, in our example the word “e” is predicted because it
depends on the previous word occurrence. So with the help of this previous word
dependency, we are able to predict the next word. This is a very simple example
which I have taken here. But RNN has a huge application in many real-world problems.

As of now you
understood how Recurrent neural networks are and how it works. Let me summarize
it for you.

1. A single time step of the input is supplied to
the network i.e. xt is supplied to the network

2. We then calculate its current state using a
combination of the current input and the previous state i.e. we calculate ht

3. The current ht becomes ht-1 for the next time
step

4. We can go as many time steps as the problem
demands and combine the information from all the previous states

5. Once all the time steps are completed the
final current state is used to calculate the output yt

6. The output is then compared to the actual
output and the error is generated

7. The error is then backpropagated to the
network to update the weights (we shall go into the details of backpropagation
in further sections) and the network is trained.

Unfortunately, Standard RNN are comfortable
with to solve some small problem as mentioned above but as the problem gets
more complicated such as predicting stock prices, predicting rainfall, etc. and if
you implement the Standard RNN, you won’t be so delighted with the results.
That is because the simplest RNN model has a major drawback, called vanishing
gradient problem, which prevents it from being accurate.

As
you may recall, while training our network we use **backpropagation**.
In the backpropagation process, we adjust our weight matrices with the use of
a **gradient**. In the process, gradients are calculated by continuous
multiplications of derivatives. The value of these derivatives may be so small,
that these continuous multiplications may cause the gradient to practically
"vanish".

In a nutshell, the problem comes from the
fact that at each time step during training we are using the same weights to
calculate the output*.* That multiplication is also done
during back-propagation. The further we move backward, the bigger or smaller
our error signal becomes. This means that the network experiences difficulty
in memorizing words from far away in the sequence and
makes predictions based on only the most recent ones.

That is why more powerful models like **LSTM** and **GRU** come
in the picture. Solving the above issue, they have become the accepted way of
implementing recurrent neural networks to solve more complex problems.

I
hope after reading this article, finally, you came to know about **what is
Recurrent Neural Networks is and how it actually works? **

In
the next articles, I will come with a detailed explanation of some other topics
like LSTM, GRU, etc.** **

For more blogs/courses on data science, machine
learning, artificial intelligence, and new technologies do visit us at **InsideAIML**.

Thanks
for reading…