All Courses

Python - Chunk Classification

Neha Kumawat

3 years ago

Python - Chunk Classification
Table of Contents
  • Introduction
  • Add Corpus
  • tagged_sents() 

Introduction

          Classification based chunking involves classifying the text as a group of words rather than individual words. A simple scenario is tagging the text in sentences. We will use a corpus to demonstrate the classification. We choose the corpus conll2000 which has data from the of the Wall Street Journal corpus (WSJ) used for noun phrase-based chunking.

Add Corpus

          First, we add the corpus to our environment using the following command.

import nltk
nltk.download('conll2000')
Lets have a look at the first few sentences in this corpus.

from nltk.corpus import conll2000

x = (conll2000.sents())
for i in range(3):
     print x[i]
     print '\n'

When we run the above program we get the following output −


['Confidence', 'in', 'the', 'pond', 'is', 'widely', 'expected', 'to', 'take', 'another', 'sharp', 'dive', 'if', 'trade', 'figres', 'for', 'September', ',', 'de', 'for', 'release', 'tomorrow', ',', 'fail', 'to', 'show', 'a', 'sbstantial', 'improvement', 'from', 'Jly', 'and', 'Agst', "'s", 'near-record', 'deficits', '.']


['Chancellor', 'of', 'the', 'Excheqer', 'Nigel', 'Lawson', "'s", 'restated', 'commitment', 'to', 'a', 'firm', 'monetary', 'policy', 'has', 'helped', 'to', 'prevent', 'a', 'freefall', 'in', 'sterling', 'over', 'the', 'past', 'week', '.']


['Bt', 'analysts', 'reckon', 'nderlying', 'spport', 'for', 'sterling', 'has', 'been', 'eroded', 'by', 'the', 'chancellor', "'s", 'failre', 'to', 'annonce', 'any', 'new', 'policy', 'measres', 'in', 'his', 'Mansion', 'Hose', 'speech', 'last', 'Thrsday', '.']

tagged_sents() 

          Next we use the fucntion tagged_sents() to get the sentences tagged to their classifiers.

from nltk.corpus import conll2000

x = (conll2000.tagged_sents())
for i in range(3):
     print x[i]
     print '\n'

When we run the above program we get the following output −

[('Confidence', 'NN'), ('in', 'IN'), ('the', 'DT'), ('pond', 'NN'), ('is', 'VBZ'), ('widely', 'RB'), ('expected', 'VBN'), ('to', 'TO'), ('take', 'VB'), ('another', 'DT'), ('sharp', 'JJ'), ('dive', 'NN'), ('if', 'IN'), ('trade', 'NN'), ('figres', 'NNS'), ('for', 'IN'), ('September', 'NNP'), (',', ','), ('de', 'JJ'), ('for', 'IN'), ('release', 'NN'), ('tomorrow', 'NN'), (',', ','), ('fail', 'VB'), ('to', 'TO'), ('show', 'VB'), ('a', 'DT'), ('sbstantial', 'JJ'), ('improvement', 'NN'), ('from', 'IN'), ('Jly', 'NNP'), ('and', 'CC'), ('Agst', 'NNP'), ("'s", 'POS'), ('near-record', 'JJ'), ('deficits', 'NNS'), ('.', '.')]


[('Chancellor', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('Excheqer', 'NNP'), ('Nigel', 'NNP'), ('Lawson', 'NNP'), ("'s", 'POS'), ('restated', 'VBN'), ('commitment', 'NN'), ('to', 'TO'), ('a', 'DT'), ('firm', 'NN'), ('monetary', 'JJ'), ('policy', 'NN'), ('has', 'VBZ'), ('helped', 'VBN'), ('to', 'TO'), ('prevent', 'VB'), ('a', 'DT'), ('freefall', 'NN'), ('in', 'IN'), ('sterling', 'NN'), ('over', 'IN'), ('the', 'DT'), ('past', 'JJ'), ('week', 'NN'), ('.', '.')]


[('Bt', 'CC'), ('analysts', 'NNS'), ('reckon', 'VBP'), ('nderlying', 'VBG'), ('spport', 'NN'), ('for', 'IN'), ('sterling', 'NN'), ('has', 'VBZ'), ('been', 'VBN'), ('eroded', 'VBN'), ('by', 'IN'), ('the', 'DT'), ('chancellor', 'NN'), ("'s", 'POS'), ('failre', 'NN'), ('to', 'TO'), ('annonce', 'VB'), ('any', 'DT'), ('new', 'JJ'), ('policy', 'NN'), ('measres', 'NNS'), ('in', 'IN'), ('his', 'PRP$'), ('Mansion', 'NNP'), ('Hose', 'NNP'), ('speech', 'NN'), ('last', 'JJ'), ('Thrsday', 'NNP'), ('.', '.')]
   
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger. 
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our insideAIML blog page.
Keep Learning. Keep Growing. 

Submit Review