All Courses

Python - Remove Stopwords

Neha Kumawat

10 months ago

Python - Remove Stopwords | insideAIML
Table of Contents
  • Introduction
  • Verifying the Stopwords
  • Example

Introduction

          Stopwords are the English words which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence. For example, the words like the, he, have etc. Such words are already captured this in corpus named corpus. We first download it to our python environment.

import nltk
nltk.download('stopwords')
It will download a file with English stopwords.

Verifying the Stopwords


from nltk.corpus import stopwords
stopwords.words('english')
print stopwords.words() [620:680]
When we run the above program we get the following output −

[u'your', u'yours', u'yourself', u'yourselves', u'he', u'him', u'his', u'himself', u'she', 
u"she's", u'her', u'hers', u'herself', u'it', u"it's", u'its', u'itself', u'they', u'them', 
u'their', u'theirs', u'themselves', u'what', u'which', u'who', u'whom', u'this', 
u'that', u"that'll", u'these', u'those', u'am', u'is', u'are', u'was', u'were', u'be',
u'been', u'being', u'have', u'has', u'had', u'having', u'do', u'does', u'did', u'doing',
u'a', u'an', u'the', u'and', u'but', u'if', u'or', u'because', u'as', u'until',
u'while', u'of', u'at']
The various language other than English which has these stopwords are as below.


from nltk.corpus import stopwords
print stopwords.fileids()

When we run the above program we get the following output −

[u'arabic', u'azerbaijani', u'danish', u'dutch', u'english', u'finnish', 
u'french', u'german', u'greek', u'hungarian', u'indonesian', u'italian', 
u'kazakh', u'nepali', u'norwegian', u'portuguese', u'romanian', u'russian',
u'spanish', u'swedish', u'turkish']

Example

We use the below example to show how the stopwords are removed from the list of words.

from nltk.corpus import stopwords
en_stops = set(stopwords.words('english'))

all_words = ['There', 'is', 'a', 'tree','near','the','river']
for word in all_words: 
    if word not in en_stops:
        print(word)
When we run the above program we get the following output −

There
tree
near
river
   
Enjoyed reading this blog? Then why not share it with others. Help us make this AI community stronger. 
To learn more about such concepts related to Artificial Intelligence, visit our insideAIML blog page.
You can also ask direct queries related to Artificial Intelligence, Deep Learning, Data Science and Machine Learning on our live insideAIML discussion forum.
Keep Learning. Keep Growing.

Submit Review