Python - Tagging Words

Neha Kumawat

5 months ago

Part of Speech Tagging | Insideaiml
Table of Contents
  • Introduction
  • Tag Descriptions
  • Tagging a Corpus

Introduction

          Tagging is an essential feature of text processing where we tag the words into grammatical categorization. 
Download required modules 

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
We take help of tokenization and pos_tag function to create the tags for each word.

import nltk

sample_text = nltk.word_tokenize("Python is an interpreted high-level general-purpose programming language.")
tagged_sample_text = nltk.pos_tag(sample_text)
print(tagged_sample_text)
When we run the above program, we get the following output −

[('Python', 'NNP'), ('is', 'VBZ'), ('an', 'DT'), ('interpreted', 'JJ'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN'), ('.', '.')]

Tag Descriptions

We can describe the meaning of each tag by using the following program which shows the in-built values.

import nltk

nltk.help.upenn_tagset('NN')
nltk.help.upenn_tagset('IN')
nltk.help.upenn_tagset('DT')
When we run the above program, we get the following output −
NN: noun, common, singular or mass
    common-carrier cabbage knuckle-duster Casino afghan shed thermostat
    investment slide humour falloff slick wind hyena override subhumanity
    machinist ...
IN: preposition or conjunction, subordinating
    astride among uppon whether out inside pro despite on by throughout
    below within for towards near behind atop around if like until below
    next into if beside ...
DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those

Tagging a Corpus

We can also tag a corpus data and see the tagged result for each word in that corpus.

import nltk

from nltk.tokenize import sent_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
tokenized = sent_tokenize(sample)
for i in tokenized[:2]:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            print(tagged)

When we run the above program we get the following output −

[([', 'JJ'), (Poems', 'NNP'), (by', 'IN'), (William', 'NNP'), (Blake', 'NNP'), (1789', 'CD'), 
(]', 'NNP'), (SONGS', 'NNP'), (OF', 'NNP'), (INNOCENCE', 'NNP'), (AND', 'NNP'), (OF', 'NNP'), 
(EXPERIENCE', 'NNP'), (and', 'CC'), (THE', 'NNP'), (BOOK', 'NNP'), (of', 'IN'), 
(THEL', 'NNP'), (SONGS', 'NNP'), (OF', 'NNP'), (INNOCENCE', 'NNP'), (INTRODUCTION', 'NNP'), 
(Piping', 'VBG'), (down', 'RP'), (the', 'DT'), (valleys', 'NN'), (wild', 'JJ'), 
(,', ','), (Piping', 'NNP'), (songs', 'NNS'), (of', 'IN'), (pleasant', 'JJ'), (glee', 'NN'),
 (,', ','), (On', 'IN'), (a', 'DT'), (cloud', 'NN'), (I', 'PRP'), (saw', 'VBD'), 
 (a', 'DT'), (child', 'NN'), (,', ','), (And', 'CC'), (he', 'PRP'), (laughing', 'VBG'), 
 (said', 'VBD'), (to', 'TO'), (me', 'PRP'), (:', ':'), (``', '``'), (Pipe', 'VB'),
 (a', 'DT'), (song', 'NN'), (about', 'IN'), (a', 'DT'), (Lamb', 'NN'), (!', '.'), (u"''", "''")]
I hope you enjoyed reading this article and finally, you came to know about Python - Tagging Words.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review