Python - Spelling Check

Neha Kumawat

5 months ago

Python - Spelling Check | insideaiml
Table of Contents
  • Introduction
  • Case Sensitive
  • Non-English Dictionaries

Introduction

          Checking of spelling is a basic requirement in any text processing or analysis. The python package pyspellchecker provides us this feature to find the words that may have been misspelled and also suggest the possible corrections.
It uses a Levenshtein distance algorithm to find permutations within a processing distance of 2 of the original word. It then compares all permutations (insertions, deletions, replacements and transpositions) with known words in a word frequency list. In the frequency list, the results are often more likely to be correct.
pyspellchecker supports multiple languages including English, Spanish, German, French, and Portuguese.
First, we need to install the required package using the following command in our python environment.
 pip install pyspellchecker 
You can also install from source:
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install
Now we see below how the package is used to point out the wrongly spelled words as well as make some suggestions about possible correct words.

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['python','programing','langage','hapenning'])

for word in misspelled:
    # most likely answer
    print(spell.correction(word))
    
    # other similar options
    print(spell.candidates(word))

When we run the above program we get the following output −
happenning
{'hapening', 'happenning'}
language
{'language'}
If the Word Frequency list is not to your liking, you can add additional text to generate a more appropriate list for your use case.
from spellchecker import SpellChecker

# loads default word frequency list
spell = SpellChecker()  
spell.word_frequency.load_text_file('./my_free_text_doc.txt')

# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['IBM', 'Wipro', 'google'])

# will return both now
spell.known(['Wipro', 'google'])  
For long words that you want to check, it is a good idea to reduce the distance to 1. This can be done when the spell checking class is initialized or afterwards.
from spellchecker import SpellChecker

# set at initialization
spell = SpellChecker(distance=1)  

# do some work on longer words
# set the distance parameter back to the default
spell.distance = 2  

Case Sensitive

          If we use Let in place of let then this becomes a case sensitive comparison of the word with the closest matched words in dictionary and the result looks different now.

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['Let', 'us', 'wlak','on','the','groun'])


for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

When we run the above program we get the following output −
group
{'groin', 'ground', 'groan', 'group', 'grown', 'grout'}
walk
{'walk', 'flak', 'weak'}
get
{'aet', 'ret', 'get', 'cet', 'bet', 'vet', 'pet', 'wet', 'let', 'yet', 'det', 'het', 'set', 'et', 'jet', 'tet', 'met', 'fet', 'net'}

Non-English Dictionaries

          pyspellchecker supports several standard dictionaries as part of the standard package. Each is easy to use when initializing the dictionary:
from spellchecker import SpellChecker

# the default is English (language='en')
english = SpellChecker()  

# use the Spanish Dictionary
spanish = SpellChecker(language='es')  

# use the Russian Dictionary
russian = SpellChecker(language='ru')  
The currently supported dictionaries are:
  • English - ‘en’
  • Spanish - ‘es’
  • French - ‘fr’
  • Portuguese - ‘pt’
  • German - ‘de’
  • Russian - ‘ru’
I hope you enjoyed reading this article and finally, you came to know about Python - Spelling Check.
   
To know more about python programming language follow the insideaiml youtube channel.  
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review