Kajal Pawar

10 months ago

In my previous article “Natural Language Processing using NLTK package” I gave a detailed explanation about NLTK package and how we use it. In this article, I will try to give a brief introduction about Spacy package which is an advanced NLP package.
If you didn’t read my previous articles. I recommend you to first go through my previous article on NLTK package mentioned below and then come back to this article for more better understanding:

Natural Language processing using NLTK package

Let’s start…
What is SpaCy?
spaCy is a free, open-source library to perform advanced NLP in Python. It’s written in Cython and is designed to build information extraction or natural language understanding systems. It’s built for production use and provides a concise and user-friendly API.

spaCy’s Statistical Models

There are some different statistical models present spacy which are the power engines of this package. These models enable spaCy to perform several NLP related tasks, such as part-of-speech tagging, named entity recognition, and dependency parsing.
Below listed are the different statistical models present in spaCy along with their specifications:
  •   en_core_web_sm: English multi-task CNN trained on OntoNotes. Size – 11 MB
  •   en_core_web_md: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 91 MB
  •    en_core_web_lg: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 789 MB
Let’s see how to install scapy
$pip install spacy
Import spacy
Type the first command on you cmd. After when the spacy package gets installed the import it by typing the second command.
Let’s see how we can load the statistical model in spacy. Don’t worry it’s pretty simple.
import spacy
st_model = spacy.load(“en_core_web_sm”)
Similarly, you can load any statistical model present in spacy and use them.

What are Spacy Processing Pipelines?

The first step for a text string, when working with spaCy, is to pass it to an NLP object. This object is essentially a pipeline of several text pre-processing operations through which the input text string has to go through.

Spacy Processing Pipelines
Spacy Processing Pipelines
                                                                Figure: S

Submit Review