#### World's Best AI Learning Platform **with profoundly Demanding** Certification Programs

Designed by IITians, only for AI Learners.

Internship Partner

In Association with

In collaboration with

Designed by IITians, only for AI Learners.

Internship Partner

In Association with

In collaboration with

New to InsideAIML? Create an account

Employer? Create an account

Designed by IITians, only for AI Learners.

Internship Partner

In Association with

In collaboration with

Enter your email below and we will send a message to reset your password

Designed by IITians, only for AI Learners.

Internship Partner

In Association with

In collaboration with

By providing your contact details, you agree to our Terms of Use & Privacy Policy.

Already have an account? Sign In

Designed by IITians, only for AI Learners.

Internship Partner

In Association with

In collaboration with

By providing your contact details, you agree to our Terms of Use & Privacy Policy.

Already have an account? Sign In

Download our e-book of Introduction To Python

Matplotlib - Object-oriented InterfaceMatplotlib - MultiplotsCNTK - Creating First Neural NetworkMatplotlib - Working with ImagesMicrosoft Cognitive Toolkit (CNTK) - CPU and GPUPython Forensics - Memory and ForensicsPython Blockchain - Scope and ConclusionDiscuss Microsoft Cognitive ToolkitMatplotlib - Twin AxesMatplotlib - Subplot2grid() Function View More

Exception Type: JSONDecodeError at /update/ Exception Value: Expecting value: line 1 column 1 (char 0) How can I write Python code to change a date string from "mm/dd/yy hh: mm" format to "YYYY-MM-DD HH: mm" format? How to choosing the right estimator for the machine learning problem? How to Write Python ZIP File? How to extracting text from PDF file using python How can a web interface execute a .py file from a PHP file? What methods can we use to differentiate between correlated and uncorrelated variables in a regression analysis? How to leave/exit/deactivate a Python virtualenvironment Join Discussion

4.5 (1,292 Ratings)

589 Learners

Sep 30th (7:00 PM) 1117 Registered

Neha Kumawat

2 years ago

- Introduction
- What is Convolutional Neural Network or CNN?

- Simple Neural Network

- Convolutional Neural Network

- Convolutional Layer
- Pooling Layer
- Fully Connected Layer

- Complete Convolutional Neural Network

- Summary

In
our previous article’s or tutorial, we saw how neural networks and what is
Artificial neural network (ANN) and does it actually work. In this tutorial we
will try to learn about one of another type of neural networks architecture
known as Convolutional neural networks or commonly called as CNN.

We can think Convolutional neural networks like a combination
of biology and math with a little CS sprinkled in, but these networks have been
some of the most influential innovations in the field of computer vision.

In 2012, was the first year that neural nets grew to
prominence as Alex Krizhevsky used them to win that year’s ImageNet competition
(basically, the annual Olympics of computer vision), dropping the
classification error record from 26% to 15%, an astounding improvement at the
time. Ever since then, a host of companies have been using deep learning at the
core of their services. Facebook uses neural nets for their automatic tagging
algorithms, Google for their photo search, Amazon for their product
recommendations, Pinterest for their home feed personalization, and Instagram
for their search infrastructure.

So, you might be thinking that what are the applications of
Convolutional neural networks?

Convolutional neural network (ConvNets
or CNNs) have many applications. Some of the most used applications of CNN
are to do images recognition, image classifications, object detections, Recognition
faces, and so on.

As you got a brief introduction about
how CNN came and its different applications where it is mostly used. So, now
let’s get deep dive and try to learn it in detail.

Convolutional
Neural Networks are very much similar to ordinary Neural Networks. They are
made up of neurons that have learnable weights and biases. Each neuron receives
some inputs, performs a dot product, and optionally follows it with a
non-linearity. The whole network still expresses a single differentiable score
function: from the raw image pixels on one end to class scores at the other.
And they still have a loss function (e.g. SVM/Softmax) on the last
(fully-connected) layer and all the tips/tricks we developed for learning
regular Neural Networks still apply.

So,
what’s the main difference between them?

Convolutional
Neural Networks architectures make the explicit assumption that the inputs are
images, which allows us to encode certain properties into the architecture.
These then make the forward function more efficient to implement and vastly
reduce the number of parameters in the network.

Let’s
visualize it.

So, we can see from the image that a simple 3-layer Neural
Network and how Convolutional Neural networks arrange its neurons in three
dimensions (width, height, depth), as visualized in one of the layers. Every
layer of a CNN transforms the 3D input volume to a 3D output volume of neuron
activations. In this example, the red input layer holds the image, so its width
and height would be the dimensions of the image, and the depth would be 3 which
represents the RGB channels of the image. (Red, Green, Blue channels).

Let’s
take an example of an image classification problem and try to understand how
Convolutional Neural Networks work.

Imagine,
you have a dog and a cat at your home and you have many different photos of
them. Let’s imagine they are quite small and look somewhat similar to each
other. While looking at the photos of them, you are able to differentiate
both of them quite easily. But how a computer can do this. How it can differentiate
the image and tells you that it’s a cat image or a dog image. Here comes the
deep learning Convolutional neural network (CNNs).

CNN image classifications take input as an image,
process it, and classify it under certain categories (E.g., Dog, Cat). Computers
see an input image as an array of pixels and it depends on the image
resolution. Based on the image resolution, it will see h x w x d (h = Height, w
= Width, d = Dimension).

For example, say, when a
computer sees an image (takes an image as input), it will see an array of pixel
values. Depending on the resolution and size of the image, say it will see a 28
x 28 x 3 array of numbers (The 3 refers to RGB values). Let's say we have a
color image in JPG format and its size is 300 x 300. The representative array
will be 300 x 300 x 3. Each of these numbers is given a value from 0 to 255
which represents the pixel intensity values at that point. These numbers, while
meaningless to us when we perform image classification, are the only inputs
available to the computer. The idea is that you give the computer this
array of numbers and it will output numbers that describe the probability of
the image being a certain class (.80 for a cat, .20 for a dog, etc.).

1.
Convolutional layer

2.
Pooling layer

3.
Fully connected layer

Basically, to train deep learning CNN models, each
input image will pass through a series of **convolution layers with filters
(Kernels),** **Pooling layer**, **fully connected layers (FC) **and **then
apply an activation function **say SoftMax activation function to classify an
image with probabilistic values between 0 and 1. The class which is having the
maximum probability value, the image is classified as that class.

Let’s see the complete flow of CNN to process an input
image and classifies the objects based on values with the help of an image.

Lets now try to understand, each layer in details and
learn how CNN's works.

The
**Convolutional
layer **is
the core building block of any Convolutional Neural Network that does most of
the computational heavy lifting. The first layer in CNN is always a Convolutional Layer.
It preserves the relationship
between pixels by learning image features using small squares of input data. It
is a mathematical operation a dot product that takes two inputs such as image
matrix and a filter or kernel and compute elements wise multiplication between
them.

Let’s consider a 5 x 5 image matrix whose pixel values
are 0 and 1 and a filter matrix of 3 x 3 with some random weights (here say 0
and 1) as shown in the figure.

As
the filter is sliding, or convolving, around the input image matrix, it is
multiplying the values in the filter with the original pixel values of the
image (aka computing element-wise multiplications) and produces a
matrix called **“Feature Map**” or “**Activation map**” as shown in the
figure below

We can apply different convolution on an image with
different filters available to perform operations such as edge detection, blur , and sharpen by applying different filters. The below example shows various
convolution image after applying different types of filters (Kernels).

So we can see that, at first convolution, the image
pixel values and the weights of the filter/kernels is multiplied elements wise
and summed up (say 108 from the image) and filled in the top right corner, then
the filter moved two steps right and again image pixel values and filter/kernels weights get multiplied and
summed up (say 126 from image) and filled up. Similarly, this process continues
and when it's complete sliding all over the image, it produces a feature map.

Next, an important term comes into picture “**padding”.**

SO,** what is ****Padding and why it is required?**

Sometimes filter or
kernels do not perfectly fit input image so at the time we have to apply some
padding to the image to solve is a problem.

Padding is an additional layer that we can add to the border of an image. For an
example see the figure below there one more layer added to the 4*4 image and
now it has converted into 5*5 image

So, now we have more frame that covers the edge pixels
of an image. More information means more accuracy that’s how a neural network
works.

But well, apart from that, now we are getting an end
image that is larger than the original image. Still, the shrinking will happen
but we can get kind of a good image than going forward like before without the
padding. So that’s how padding works.

We have some options to apply
padding:

- Pad the picture with zeros (zero-padding) so that it fits.

- ·Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.

As of now, we get our feature map so our next step is
to apply activation function on it (say ReLU activation function).

ReLU stands for Rectified Linear Unit for a
non-linear operation.

The output ReLU function is given as

So, the next question arises **Why ReLU is
important?**

ReLU’s activation function is applied to
introduce non-linearity in our Convolutional Neural networks. Since the real-world
data would want our CNNs to learn non-negative linear values.

There are different nonlinear functions are available
such as **tanh** or **sigmoid** that can also be used instead of ReLU.
But most of the data scientists, researchers use ReLU activation function since
performance wise ReLU is better than the other activation functions.

Now next the pooling layer in Convolution Neural
networks.

Pooling layers is
applied to reduce the number of parameters when the images are too large.
Spatial pooling also called subsampling or down-sampling which reduces the
dimensionality of each map but retains important information. Spatial pooling
can be of different types:

- Max Pooling

- Average Pooling

- Sum Pooling

Max pooling takes the largest element from the rectified feature map.
Taking the largest element could also take the average pooling. Sum of all
elements in the feature map call as sum pooling.

Next
come the final layer, a fully connected layer which is applied mainly for classification.

The layer we call as FC layer, we flattened our matrix
into a vector and feed it into a fully connected layer like a neural network.

In the above diagram, the feature map
matrix will be converted as a vector (x1, x2, x3, …). With the fully connected
layers, we combined these features together to create a model. Finally, we apply
an activation function such as SoftMax or sigmoid to classify the outputs as a cat, dog, etc.

1.
Provide
input image into convolution layer

2.
Choose
parameters, apply filters with strides, padding if requires. Perform convolution
on the image and apply ReLU activation to the matrix.

3.
Perform
pooling to reduce dimensionality size

4.
Add as
many convolutional layers as per requirement

5.
Flatten
the output and feed into a fully connected layer (FC Layer)

6.
Output the
class using an activation function (Logistic Regression with cost functions)
and classifies images.

There are many architectures available such as AlexNet,
VGGNet, GoogLeNet, and ResNet based on Convolutional Neural networks. Later, I
will try to explain to you each architecture in detail.

I hope after reading this article, finally, you came to know about
**what is Convolutional Neural Networks is and different terminologies used in
CNN's and how it actually works?**

In the next articles, I will come with a detailed explanation
of some other topics.** **For more blogs/courses on data science, machine
learning, artificial intelligence, and new technologies do visit us at** InsideAIML.**

Thanks for reading…