Difference
between Natural Language Processing and Speech Recognition
Neha Kumawat
2 years ago
Table of Contents
Introduction
What is Natural Language Processing?
What are the Components of NLP?
1. Natural Language Understanding (NLU)
2. Natural Language Generation (NLG)
What is Speech recognition?
How Siri Works?
Introduction
Today I try
to give a brief introduction about one of the most confusing terms which are
speech recognition and Natural language processing. Many peoples who are new to
this field get confused between them. Today I will try to clear this confusion.
So, let’s start
What is Natural Language Processing?
NLP stands for Natural
Language Processing which is a part of Artificial Intelligence. NLP is used for
communicating with an intelligent system using a natural language such as
English, etc.
As we know how much data
is being generated on a daily basis so processing of that data is required when
you want an intelligent system like robot to perform as per your instructions,
when you want to hear decision from a dialogue based clinical expert system,
etc.
NLP is the practice of understanding how people organize
their thinking, feeling, language and behavior to produce the results they do.
NLP provides people with a methodology to model outstanding performances
achieved by geniuses and leaders in their field. NLP is also used for personal
development and for success in business.
A key element of NLP is that we form our unique
internal mental maps of the world as a product of the way we filter and
perceive information absorbed through our five senses from the world around us
NLP in simple language is
nothing but making computers to perform useful tasks with the natural language’sas
humans use to do.
The input and output of an
NLP system can be −
Speech
Written Text
What are the Components of NLP?
There are mainly two components of NLP which can be given as −
1)
Natural
Language Understanding (NLU)
In Natural LanguageUnderstanding, it involves the following tasks −
First, here
we map the given input in natural language into some useful
representations.
Then, analyzing
different aspects of the language.
2)
Natural
Language Generation (NLG)
Natural Language
generation (NLG) is the process of producing meaningful phrases and sentences
in the form of natural language from some internal representation.
Mainly it involves −
Text planning – In text planning, we try to retrieve the relevant content
from knowledge base.
Sentence planning – In Sentence planning, we
try to choose required words, forming meaningful phrases and setting tone of
the sentence.
Text Realization – In Text Realization, we try
to map sentence plan into sentence structure.
Note: As comparison
between Natural Language Understanding (NLU) is harder than Natural Language
Generation (NLG).
What is Speech recognition?
In simple terms, speech
recognition is nothing but simply the ability of a software to recognize
speech. Anything that a person says, in a language of their choice, must be recognized
by the software.
Speech recognition used to prepare
the input data (speech) to be appropriate for natural language processing
(text).
Let’s take an example.
How Siri Works?
Siri
works on mainly 2 technologies which are Speech Recognition and Natural
LanguageProcessing.
Here,
Speech Recognition is used to convert a human speech into its corresponding
textual form. For instance, when you trigger Siri by saying “Hey Siri! How
is the weather today?”, in the back-end, a powerful speech recognition
system by Apple kicks off and converts your audio into its corresponding
textual form – “Hey Siri! How is the weather today” This is an extremely
challenging task simply because we humans have a highly diverse set of tones as
well as accents? The accents vary not only across countries, but also across
states/cities within a country. Some people speak fast, some speak slowly.
Characteristics of male and female voices are also very different with each
other.
The
engineers at Apple train Machine Learning models on large, transcribed datasets
in order to create efficient speech recognition models for Siri. These models
are trained with highly diverse datasets that comprise of the voice samples of
a large group of people. This way, Siri is able to cater to various accents.
In
the recent years, deep learning has proven to produce phenomenal results in
speech recognition. The word error rate of speech recognition engines has
drastically gone down to less than 10%. This has been possible due to the
availability of not only large datasets, but also powerful hardware using
speech recognition algorithms that can be trained on the datasets.
Once Siri has understood what you are saying,
the converted text is sent to Apple servers for further processing. Apple
servers then run Natural Language Processing (NLP) algorithms on this text to
understand theintent of what the user is
trying to say.
For instance, the NLP engines are able to
differentiate that when a user is saying “set an alarm for 6AM tomorrow,” the
user is asking about setting an alarm and not about making a call. This is
challenging because different users speak the same sentence in different ways.
For instance, one can say the same thing in the following ways:
·
Hey
Siri, can you set me an alarm for 6AM tomorrow?
·
Siri,
can you wake me up tomorrow at 6AM?
·
Siri,
please set an alarm for tomorrow at 6AM.
·
Siri,
please wake me up tomorrow at 6AM.
These are just a few right ways of telling
Siri to set an alarm. Some people may speak grammatically incorrect sentences –
“Siri alarm set me tomorrow at 6AM”. As a result, the intent analysis becomes very challenging. Just
like speech recognition, intent analysis also requires a lot of data in order
to train Natural Language Processing algorithms. Only when the dataset provided
is huge is it the case that Siri is able to generalize and capture the
variations of the same sentence that it has never seen. This makes the whole
processes an extremely difficult task.
These are just 2 of the most fundamental
challenges. Another important technology behind Siri that employs Machine
Learning is that of contextual understanding.
You can talk to Siri like you are talking to a human: You: Hey Siri, set an alarm.
Siri: What time do you want me to set an alarm?
You: 6 AM.
In
the last sentence, when you said “6 AM”, Siri was able to understand and
correlate that this 6 AM is a continuation of the last message where you asked
it to set an alarm.
One final technology that Siri employs in this
whole process is that of entity extraction.
When you ask Siri to set an alarm for tomorrow at 6AM,
Siri not only understands the meaning of your sentence, but also it
automatically picks up entities from
the sentence such as 6AM and tomorrow.
That’s how Siri and almost all speech
recognition-based devices work and becoming so popular in our day to day life
cycle.
Speech recognition and Natural
Language processing are usually used together in Automatic Speech Recognition
engines, Voice Assistants and Speech analytics tools.
I hope after reading
this article, finally, you came to know about what is the main difference
betweenNatural Language Processing and Speech Recognition.
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger.
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our blog page - https://insideaiml.com/blog