Named Entity Recognition (NER) with SpaCy [with code example]

Tutorial

Named Entity Recognition (NER) with SpaCy [with code example]

Aditya Singh

May 7, 2024

Apr 21, 2025

By the end of this tutorial, you will be able to write a Named Entity Recognition pipeline using SpaCy: it will detect company acquisitions from news headlines.

Named Entity Recognition (NER) with SpaCy [with code example]

By the end of this tutorial, you will be able to write a Named Entity Recognition pipeline using SpaCy: it will detect company acquisitions from news headlines.

Demo:

Learn on practice how to use named entity recognition to mine insights from news in real-time. We will write a Python script to analyze news articles' headlines to learn about all the latest company acquisitions.

In this tutorial, we will:

build a data pipeline to fetch real-time news headlines
apply pre-trained named entity recognition models provided by Spacy to identify companies that are acquired
build an engine that tracks all companies acquisitions

Intro. NLP & Named Entity Recognition

Natural Language Processing (NLP) is a set of techniques that helps analyze human-generated text.

Examples of applying NLP to real-world business problems:

chatbots
text summarization
recommendation engines
text autogeneration
"trading on news"
speech recognition

We try to solve a problem in this article: find all company acquisitions from news articles published online.

Why Named Entity Recognition?

Let's compare these two articles:

"Apple is planning to acquire a news API provider in the near future"

"Apple for breakfast: pros and cons"

Both articles mention "Apple"; however, the first one talks about the company, while the second one is about the fruit.

Without named entity recognition, we are not able to even qualify the content.

There are not just organizations (ORG) that the NER can identify. Spacy - Python NLP package that we're going to use - has a pre-built NER pipeline that includes:

DATE - absolute or relative dates or periods
PERSON - People, including fictional
GPE - Countries, cities, states
LOC - Non-GPE locations, mountain ranges, bodies of water
MONEY - Monetary values, including unit
TIME - Times smaller than a day
PRODUCT - Objects, vehicles, foods, etc. (not services)
CARDINAL - Numerals that do not fall under another type
ORDINAL - "first", "second", etc.
QUANTITY - Measurements, as of weight or distance
EVENT - Named hurricanes, battles, wars, sports events, etc.
FAC - Buildings, airports, highways, bridges, etc.
LANGUAGE - Any named language
LAW - Named documents made into laws.
NORP - Nationalities or religious or political groups
PERCENT - Percentage, including "%"
WORK_OF_ART - Titles of books, songs, etc.

For example, take a look at the article below after it's processed by the Spacy pipeline:

"On the 15th of September, Tim Cook announced that Apple wants to acquire abC Group from New York for 1 billion dollars"

Part I. Named Entity Recognition using Spacy

For now, let's use a pre-defined list of news article headlines to test named entity recognition in Spacy.

In the case of one company that acquires another one, it is fair to assume that there should be at least two ORG tags per headline: at least one for the acquire and at least one for the acquirer.

To install Spacy, run in your console:

pip install spacy

python -m spacy download en_core_web_lg

In your Python interpreter, load the package and pre-trained model:

First, let's run a script to see what entity types were recognized in each headline using the Spacy NER pipeline.

The very first example is the most obvious: one company acquires another one. Spacy NER identified both companies correctly.

The third article headline talks about an organization and a person.

All headlines talk about some acquisition; however, not all talk about one company acquiring another one.

1. Function to filter only the articles with two ORG entities (and "acquire" in-between)

Also, let's make it a generic function that we can apply to no matter which article.

The function below does two things:

detects if a lemma "acquire" is present in the sentence. We want to use lemma - a canonical form of a word - to make sure we cover cases as "acquired", "acquires", etc.
check that at least 2 ORG are found

Limitations:

can be applied only to one sentence (not on a text)
accuracy depends on a NER that you use

Analyzing the dependency using Spacy

Now that we know that a headline is an excellent candidate for an acquisition example, we want to ensure that one company is acquiring another one. For this, we will use dependency parsing with Spacy.

I'm not going to go much into the detail as it is outside of the scope of this article.

Using display, we can visualize the dependency tree.

2. Function to detect a company(ies) to be bought

The company that is bought should be:

of entity type ORG
with "head" lemma being "acquire"
independency of "attr" or "dobj"

After identifying the token, we:

take its noun chunk
check for "siblings": other companies that can be bought as well

3. Function to detect an acquiring company(ies)

4. Combine functions and test

Part II. Getting Real-Time Headlines with News API

I'll provide a quick example of how you could get the latest articles that contain lemma "acquire" from 2 sources: prnewswire and businesswire.

We'll use NewsCatcher News API Python SDK for that.

To install News API SDK, run in your console:

pip install newscatcherapi

You can grab your API key here.

We'll make a call to find articles that:

contain lemma "acquire" in title
published by prnewswire.com or businesswire.com
published in the past 24 hours
are written in English

Finally, we print out the results.

I leave you here so that you can experiment with actual data yourself.

A few notes on my Spacy NER accuracy with "real world" data

Low accuracy with sentences without a proper casing

Low accuracy overall, even with a large model
You'd need to fine-tune your model if you want to use it in production
Overall, there's no open-source high accuracy NER model that you can use out-of-a-box

Choosing the Right News API Should Be Easy

Get access to the guide that simplifies your decision-making. Enter your email to download now.

Success! Your white paper is on its way. Be sure to check your inbox shortly!

Oops! Something went wrong while submitting the form.

READY FOR
CUSTOM NEWS SOLUTIONS?

Drop your email and find out how our API delivers precisely what your business needs.

Named Entity Recognition (NER) with SpaCy [with code example]

Intro. NLP & Named Entity Recognition

Why Named Entity Recognition?

Part I. Named Entity Recognition using Spacy

1. Function to filter only the articles with two ORG entities (and "acquire" in-between)

Analyzing the dependency using Spacy

2. Function to detect a company(ies) to be bought

3. Function to detect an acquiring company(ies)

4. Combine functions and test

Part II. Getting Real-Time Headlines with News API

A few notes on my Spacy NER accuracy with "real world" data

Choosing the Right News API Should Be Easy

READY FOR CUSTOM NEWS SOLUTIONS?

READY FOR
CUSTOM NEWS SOLUTIONS?