NLP features in News API v3
Familiarize yourself with the NLP features available in News API v3
This guide explains the Natural Language Processing (NLP) features available in News API v3, how to use them, and their practical applications. By leveraging these NLP capabilities, you can extract meaningful insights from news data, enhancing your analysis, research, and application development.
Understanding NLP layer
When processing news, we summarize the article content, categorize articles by theme, estimate the overall tone of the writing, and identify important names and places mentioned in the text. As a result, we supply each processed article with additional NLP information that you can use when making requests via News API v3.
The NLP layer in News API v3 consists of the following components:
Component | Description | Plan Requirement |
---|---|---|
Theme | General topic or category of the article | v3_nlp |
Summary | Concise overview of the article’s content | v3_nlp |
Sentiment | Separate scores for title and content sentiment | v3_nlp |
Named Entities | Identified persons, organizations, locations, and miscellaneous entities | v3_nlp |
IPTC Tags | Standardized news category tags | v3_nlp_iptc_tags |
IAB Tags | Content categories for digital advertising | v3_nlp_iptc_tags |
Custom Tags | Organization-specific classification system | All v3 NLP plans |
Embeddings | 1024-dimensional vector representation for semantic similarity | v3_nlp_embeddings |
To learn more about plan features and requirements, see Subscription plans.
Including NLP data in API responses
To include the NLP layer in your API responses, use these parameters:
include_nlp_data
(boolean): Set totrue
to include the NLP layer for each article.has_nlp
(boolean): Set totrue
to filter results to include only articles with an NLP layer.
Code example
Here’s how you can make a request to include NLP data in your search results using Python:
Here’s a snippet of what you might see in the response, focusing on the NLP data for a single article:
This response shows the rich NLP data available for each article, including theme classification, summary, sentiment analysis, named entity recognition, and content tagging. Let’s examine each of these components.
Theme classification
Theme classification categorizes articles into predefined topics, allowing for efficient filtering and organization of news content.
Available themes
News API v3 supports the following themes:
Business
Economics
Entertainment
Finance
Health
Politics
Science
Sports
Tech
Crime
Financial Crime
Lifestyle
Automotive
Travel
Weather
General
Filtering by theme
Use the theme
and not_theme
parameters to filter articles based on their
classified themes:
theme
(string): Includes articles matching the specified theme(s).not_theme
(string): Excludes articles matching the specified theme(s).
Example:
This query returns articles about electric vehicles categorized under Automotive or Tech themes, excluding Entertainment.
Article summarization
Article summarization provides concise overviews of article content, allowing for quick understanding without reading the full text.
Using summaries in searches and clustering
You can use summaries in your searches and clustering:
-
In searches, use the
search_in
parameter:This query searches for
climate change
within article summaries, potentially yielding more relevant results than searching the full content. -
For clustering, use summaries as the clustering variable:
This approach can lead to more concise and focused clusters. For more information on clustering, see Clustering news articles.
Sentiment analysis
Sentiment analysis determines the emotional tone of an article. News API v3 provides sentiment scores for both the title and content, ranging from -1 (negative) to 1 (positive).
Filtering by sentiment
Filter articles based on sentiment scores using these parameters:
title_sentiment_min
andtitle_sentiment_max
(float): Filter by title sentimentcontent_sentiment_min
andcontent_sentiment_max
(float): Filter by content sentiment
Example:
This query returns articles about climate change with a positive content sentiment (scores between 0.2 and 1.0).
Named Entity Recognition (NER)
NER identifies and categorizes named entities within the text. News API v3 recognizes four types of entities:
PER_entity_name
(string): Person names.ORG_entity_name
(string): Organization names.LOC_entity_name
(string): Location names.MISC_entity_name
(string): Miscellaneous entities cover named entities outside of the person, organization, or location categories, such as events, nationalities, products, and works of art.
These parameters support boolean operators (AND
, OR
, NOT
), proximity
search with NEAR
, and count-based filtering.
Example of a NER query:
This query searches for articles about the tech industry that mention Apple or Microsoft as organizations and Tim Cook or Satya Nadella as persons.
To learn more about NER, see How to search by entity.
Tagging
Content tagging provides a standardized categorization of news articles,
enhancing searchability and enabling more precise content filtering. IPTC and
IAB tags are available in the v3_nlp_iptc_tags
plan. Custom tags are developed
upon request and are available in all NLP plans.
IPTC tags
IPTC (International Press Telecommunications Council) tags are a standardized set of news categories. They offer a hierarchical classification system for news content.
To filter articles by IPTC tags use the following parameters:
iptc_tags
(string): Includes articles with specified IPTC tags.not_iptc_tags
(string): Excludes articles with specified IPTC tags.
Example:
This query searches for AI-related articles tagged with specific IPTC category,
20000002
encodes arts and entertainment.
For a complete IPTC Media Topic NewsCodes list, visit the IPTC website.
IAB tags
IAB (Interactive Advertising Bureau) tags provide a standardized taxonomy for digital advertising content.
To filter articles by IAB tags use the following parameters:
iab_tags
(string): Includes articles with specified IAB tags.not_iab_tags
(string): Excludes articles with specified IAB tags.
Example:
This query returns finance-related articles categorized under Business
or
Investing
but not Personal Finance
.
For more information on IAB Content Taxonomy, visit the IAB Tech Lab website.
Custom tags
Custom tags help you classify and filter articles based on your organization’s taxonomy. Each taxonomy is organization-specific and protected by your API key, ensuring your custom classification system remains secure and private. We develop and integrate this solution upon your request. Simply provide us with your tags and their descriptions.
To filter articles by your taxonomy tags, use the custom_tags
parameter
following this pattern:
"custom_tags.taxonomy": "Tag1,Tag2,Tag3"
,
where taxonomy
is your taxonomy name and Tag1,Tag2,Tag3
are specific tags.
To specify multiple tags:
- For
GET
requests, use a comma-separated string. - For
POST
requests, use a comma-separated string or an array of strings.
Example:
For implementation details and examples, see Custom tags.
Embeddings
Vector embeddings provide a powerful way to represent article content as
numerical vectors, enabling advanced semantic analysis and similarity
comparisons. Available exclusively with the v3_nlp_embeddings
plan, each
article is processed through the
multilingual-e5-large model
to generate its vector representation.
The embedding is available in the new_embedding
field as an array of 1024
numbers. Here’s an example of how it appears in the API response:
These high-dimensional vectors capture the semantic meaning of articles, enabling various advanced applications:
- Semantic search: Find articles with similar meanings, not just matching keywords.
- Content recommendation: Suggest related articles based on semantic similarity.
- Topic clustering: Group articles by meaning using vector similarity.
- Machine learning: Train models using these dense numerical representations.
Use cases
NLP features in News API v3 enable various applications across industries:
Application | Description | Example use case |
---|---|---|
Brand Monitoring | Track mentions, analyze sentiment and identify influencers. | A tech company monitoring public perception of their latest product launch. |
Competitive Intelligence | Monitor competitors’ activities and public perception. | An automotive manufacturer tracking mentions of competitors’ electric vehicle initiatives. |
Market Research | Analyze trends, consumer sentiment, and emerging topics. | A financial services firm identifying emerging fintech trends. |
Political Analysis | Track political figures and analyze public opinion. | A political campaign monitoring sentiment around key policy issues. |
Financial Analysis | Monitor market sentiment and track company mentions. | An investment firm analyzing sentiment around potential acquisition targets. |
Academic Research | Conduct large-scale analysis of media coverage. | A researcher studying media bias in climate change reporting. |
Content Curation | Automatically filter and categorize news content. | A news aggregator app personalizing content for users based on interests. |
Trend Forecasting | Identify emerging trends across industries. | A consulting firm predicting future technology adoption trends. |
Best practices
To maximize the effectiveness of NLP features in News API v3:
- Start with broader queries and gradually refine using NLP parameters.
- Combine multiple NLP parameters for precise results.
- Use entity recognition with boolean operators to refine searches.
- Experiment with sentiment thresholds to find the right balance for your use case.
- Leverage theme classification and content tags to quickly filter large volumes of news data.
- Regularly review and update your queries to adapt to changing news landscapes.
Related resources
Was this page helpful?