All Courses

Difference between Natural Language Processing and Speech Recognition

Neha Kumawat

2 years ago

NLP Applications | insideAIML
Table of Contents
  • Introduction
  • What is Natural Language Processing?
  • What are the Components of NLP?
               1. Natural Language Understanding (NLU)
               2. Natural Language Generation (NLG)
  • What is Speech recognition?
  • How Siri Works?


          Today I try to give a brief introduction about one of the most confusing terms which are speech recognition and Natural language processing. Many peoples who are new to this field get confused between them. Today I will try to clear this confusion. So, let’s start

What is Natural Language Processing?

          NLP stands for Natural Language Processing which is a part of Artificial Intelligence. NLP is used for communicating with an intelligent system using a natural language such as English, etc.
As we know how much data is being generated on a daily basis so processing of that data is required when you want an intelligent system like robot to perform as per your instructions, when you want to hear decision from a dialogue based clinical expert system, etc.
NLP is the practice of understanding how people organize their thinking, feeling, language and behavior to produce the results they do. NLP provides people with a methodology to model outstanding performances achieved by geniuses and leaders in their field. NLP is also used for personal development and for success in business.

A key element of NLP is that we form our unique internal mental maps of the world as a product of the way we filter and perceive information absorbed through our five senses from the world around us
NLP in simple language is nothing but making computers to perform useful tasks with the natural language’sas humans use to do.
The input and output of an NLP system can be −
  • Speech
  • Written Text

What are the Components of NLP?

          There are mainly two components of NLP which can be given as −
1)   Natural Language Understanding (NLU)
          In Natural LanguageUnderstanding, it involves the following tasks −
  • First, here we map the given input in natural language into some useful representations.
  • Then, analyzing different aspects of the language.
2)   Natural Language Generation (NLG)
          Natural Language generation (NLG) is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation.
Mainly it involves −
  • Text planning – In text planning, we try to retrieve the relevant content from knowledge base.
  • Sentence planning – In Sentence planning, we try to choose required words, forming meaningful phrases and setting tone of the sentence.
  • Text Realization – In Text Realization, we try to map sentence plan into sentence structure.
Note: As comparison between Natural Language Understanding (NLU) is harder than Natural Language Generation (NLG).

What is Speech recognition?

          In simple terms, speech recognition is nothing but simply the ability of a software to recognize speech. Anything that a person says, in a language of their choice, must be recognized by the software.
Speech recognition used to prepare the input data (speech) to be appropriate for natural language processing (text).
Let’s take an example.

How Siri Works?

          Siri works on mainly 2 technologies which are Speech Recognition and Natural LanguageProcessing.
Here, Speech Recognition is used to convert a human speech into its corresponding textual form. For instance, when you trigger Siri by saying “Hey Siri! How is the weather today?”, in the back-end, a powerful speech recognition system by Apple kicks off and converts your audio into its corresponding textual form – “Hey Siri! How is the weather today” This is an extremely challenging task simply because we humans have a highly diverse set of tones as well as accents? The accents vary not only across countries, but also across states/cities within a country. Some people speak fast, some speak slowly. Characteristics of male and female voices are also very different with each other.
The engineers at Apple train Machine Learning models on large, transcribed datasets in order to create efficient speech recognition models for Siri. These models are trained with highly diverse datasets that comprise of the voice samples of a large group of people. This way, Siri is able to cater to various accents.
In the recent years, deep learning has proven to produce phenomenal results in speech recognition. The word error rate of speech recognition engines has drastically gone down to less than 10%. This has been possible due to the availability of not only large datasets, but also powerful hardware using speech recognition algorithms that can be trained on the datasets.
Once Siri has understood what you are saying, the converted text is sent to Apple servers for further processing. Apple servers then run Natural Language Processing (NLP) algorithms on this text to understand the intent of what the user is trying to say.
For instance, the NLP engines are able to differentiate that when a user is saying “set an alarm for 6AM tomorrow,” the user is asking about setting an alarm and not about making a call. This is challenging because different users speak the same sentence in different ways. For instance, one can say the same thing in the following ways:
·         Hey Siri, can you set me an alarm for 6AM tomorrow?
·         Siri, can you wake me up tomorrow at 6AM?
·         Siri, please set an alarm for tomorrow at 6AM.
·         Siri, please wake me up tomorrow at 6AM.
These are just a few right ways of telling Siri to set an alarm. Some people may speak grammatically incorrect sentences – “Siri alarm set me tomorrow at 6AM”. As a result, the intent analysis becomes very challenging. Just like speech recognition, intent analysis also requires a lot of data in order to train Natural Language Processing algorithms. Only when the dataset provided is huge is it the case that Siri is able to generalize and capture the variations of the same sentence that it has never seen. This makes the whole processes an extremely difficult task.
These are just 2 of the most fundamental challenges. Another important technology behind Siri that employs Machine Learning is that of contextual understanding. You can talk to Siri like you are talking to a human:
You: Hey Siri, set an alarm.
Siri: What time do you want me to set an alarm?
You: 6 AM.
In the last sentence, when you said “6 AM”, Siri was able to understand and correlate that this 6 AM is a continuation of the last message where you asked it to set an alarm.
One final technology that Siri employs in this whole process is that of entity extraction. When you ask Siri to set an alarm for tomorrow at 6AM, Siri not only understands the meaning of your sentence, but also it automatically picks up entities from the sentence such as 6AM and tomorrow.
That’s how Siri and almost all speech recognition-based devices work and becoming so popular in our day to day life cycle.
Speech recognition and Natural Language processing are usually used together in Automatic Speech Recognition engines, Voice Assistants and Speech analytics tools.
I hope after reading this article, finally, you came to know about what is the main difference betweenNatural Language Processing and Speech Recognition.
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger. 
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our blog page -
Keep Learning. Keep Growing. 

Submit Review