A hyperlocal news feed consists of all the news about happenings in a city or any other area of similar scale. Topics might cover transit, extreme weather alerts, crime, local events, infrastructure projects, and so on. Besides the obvious use case of creating engaging consumer applications, city news can be very useful to city planners, analysts, social workers, law enforcement agencies, real estate industrialists and investors, local businesses, or corporations looking to expand into hyperlocal markets.
The seemingly simple task of getting a news feed with all the latest happenings in a given city is unexpectedly daunting. Consider getting all the news from New York City into a feed on a daily basis. The likes of the New York Times cover more than just New York, and a big chunk of New York's news is covered by national and international media outlets. Monitoring a single source or a group of sources will not be sufficient to have a comprehensive 'New York News Feed'. Conversely, monitoring a large number of news sources for news about just one city can be very resource-intensive and may not yield a good number of relevant results. Then, there is also the problem of location keywords clashing. Using a location name as a search keyword does not yield satisfactory results, as location names could sometimes be common words or multiple locations could have the same name.
With this in mind, NewsCatcher has made getting an ultra-granular, location-focused news feed very simple: just call our Local News API! We do the heavy lifting of scanning and processing thousands of news articles each day and tagging them with locations. Our method picks up locations up to a town-level precision with 84% accuracy. A total of around 31,000 locations (US only) are covered. We also use advanced NLP techniques and AI to detect location names and resolve keyword clashes if the location name is also a common word. In this blog, let's look at how NewsCatcher's Hyperlocal News API works and what you can do with it.
A Quick Demonstration
NewsCatcher provides 15 RSS feeds, each covering local news from a US city, as a demonstration. You can see the list and links to each feed at: https://www.newscatcherapi.com/local-news/rss/. A snippet from the New York feed is shown below:
<rss version="2.0">
<channel>
<title>NewsCatcher Local News RSS Feed - New York City, New York</title>
<link>/rss/new-york-city-ny.rss</link>
<description>This is a feed for New York City, New York</description>
<lastBuildDate>Mon, 28 Oct 2024 05:41:28 +0000</lastBuildDate>
<!-- news items here -->
<!-- sample items shown below -->
<item>
<newscatcher_article_id>ac7a417f4626c2a1632faa99d31eef54</newscatcher_article_id>
<title>Anne Hathaway Channels Scary Statue of Liberty for ‘Boo York' Halloween Costume: See the Look!</title>
<theme>Entertainment</theme>
<link><https://www.aol.com/anne-hathaway-channels-scary-statue-013540114.html></link>
<media><https://s.yimg.com/ny/api/res/1.2/hXoHv3GyYsmcvEHI7tgT.w--/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD04MDA-/https://media.zenfs.com/en/aol_people_articles_471/6e980c6fc3e21bc291461002f2f2fca1></media>
<pubDate>2024-10-26 01:35:40</pubDate>
<author>Ingrid Vasquez</author>
<associated_town>New York City, New York</associated_town>
<story_articles_number>3</story_articles_number>
</item>
<item>
<newscatcher_article_id>27f1a6526f0e47fc7c13e16fb25b1991</newscatcher_article_id>
<title>City Paves Over Bed-Stuy's Hydrant ‘Aquarium' and Puts Up a Sidewalk</title>
<theme>General</theme>
<link><https://www.nytimes.com/2024/10/25/nyregion/bed-stuy-aquarium-sidewalk-brooklyn.html></link>
<media><https://static01.nyt.com/images/2024/10/25/multimedia/25xp-aquarium-wires-lwhp/25xp-aquarium-wires-lwhp-facebookJumbo.jpg></media>
<pubDate>2024-10-25 23:01:30</pubDate>
<author>Remy Tumin</author>
<associated_town>New York City, New York</associated_town>
<story_articles_number>3</story_articles_number>
</item>
<!-- more news items -->
</channel>
</rss>
In the above feed, you can see articles talking about events in New York City. The item tags have links to the source articles, along with the title, cover image, publication date, and author. We also classify the article into a theme
, such as ‘Sports’ or ‘Entertainment’, and provide the associated_town
. There can be multiple associated towns for a given article and we will return all the identified towns (or cities).
Using the REST API
The RSS feed was only a demonstration. NewsCatcher offers a full-featured REST API that can return the required local news data in JSON format. The API can also return the news for over 31,000 locations in the US, not just the 15 locations in the RSS feeds.
Let's see how you can interact with this REST API. We'll be using Python (3.6+) in the code snippets below for illustrative purposes, but the API can be used with any language over HTTP. To follow along, you'll need the NewsCatcher API Endpoint URL, an API key, and the requests
library installed. Let's put these things in the code below:
import requests
import json
NC_API_KEY = '<newscatcher-api-key-goes-here>'
NC_ENDPOINT = 'https://local-news.newscatcherapi.com'
Getting the Latest Local News Headlines
The simplest use-case of the Local News API is to get a list of the latest headlines for the location you're interested in. NewsCatcher offers a convenient endpoint to get just this. All you have to do is send a POST
request to /api/latest_headlines/
:
r = requests.post(
f'{NC_ENDPOINT}/api/latest_headlines',
headers={'x-api-token': NC_API_TOKEN},
json={
'associated_towns': [{'name': 'New York'}],
'page_size': 10,
'when': '1d',
}
)
print(json.dumps(r.json(), indent=2))
The above code calls the API and prints the indented JSON response. Let's see what’s in the result:
{
"status": "ok",
"total_hits": 3,
"page": 1,
"total_pages": 1,
"page_size": 10,
"articles": [
{
"id": "5e05185a3499db5817f265fc354f1d52",
"associated_town": [
{
"ai_validated": true,
"name": "Rochester, New York",
"description": [
"HYPERLOCAL_SOURCES_EXCLUDE_QUERY",
"HYPERLOCAL_SOURCES_INCLUDE_QUERY"
]
},
{
"ai_validated": true,
"name": "New York",
"description": [
"LOCAL_SOURCES_EXCLUDE_QUERY"
]
}
],
"ai_associated_town": null,
"score": null,
"title": "Vote: Section V's Girls Sports Athlete of the Week for Oct. 20-26 presented by Faber Builders",
"author": "Marquel Slaughter",
"link": "<https://www.democratandchronicle.com/story/sports/high-school/2024/10/28/who-is-section-v-girls-sports-athlete-of-the-week-for-oct-20-26-vote-now/75836114007>",
"description": "Your vote will determine who will be the Faber Builders Girls Sports Athlete of the Week for October 20-26.",
"media": "<https://www.democratandchronicle.com/gcdn/authoring/authoring-images/2024/09/06/PROC/75108463007-aotw-article-page-hdr-1200-x-628.jpg?crop=1115,627,x58,y0&width=1115&height=627&format=pjpg&auto=webp>",
"content": "It's time to take a...(full content truncated)",
"authors": [
"Justin Ritzel",
"James Johnson",
"Marquel Slaughter"
],
"published_date_precision": "full",
"published_date": "2024-10-28 11:03:48",
"updated_date": "2024-10-28 11:03:48",
"updated_date_precision": "full",
"is_opinion": false,
"twitter_account": "@DandC",
"domain_url": "democratandchronicle.com",
"parent_url": "<https://www.democratandchronicle.com/sports>",
"word_count": 357,
"rank": 5339,
"country": "US",
"rights": "democratandchronicle.com",
"language": "en",
"nlp": {
"theme": [
"Sports"
],
"summary": "Your vote will determine who will...(ai summary truncated)",
"sentiment": {
"title": 0.0,
"content": 0.0
},
"ner_PER": [
{
"entity_name": "Governor",
"count": 1
},
...list truncated for illustration...
],
"ner_ORG": [
{
"entity_name": "Section V",
"count": 2
},
...list truncated for illustration...
],
"ner_MISC": [
{
"entity_name": "Girls Sports Athlete of the Week",
"count": 1
},
...list truncated for illustration...
],
"ner_LOC": [
{
"entity_name": "Silver Hill Tech Park",
"count": 1
},
...list truncated for illustration...
]
},
"paid_content": false
},
...list truncated for illustration...
],
"user_input": "...object showing the input..."
}
In the above output, you can see that the query returned 3 hits. In the article data, there is data from the source, consisting of the article title, link, authors, publication date, rights attribution, and most importantly the full content. NewsCatcher also adds useful enrichments by analyzing the data. The first is of course the list of detected towns, which we use to filter the results and return just the news relevant to your input location. Apart from this, you can also see the theme of the article, sentiment analysis scores, and lists of recognized entities. The recognized entities are classified as persons, organizations, locations, and miscellaneous. You can readily use the enriched data for further analysis.
The Search Endpoint
While the latest_headlines
endpoint was easy to get started with, that is not all that the Local News API offers. You can also use the search
endpoint which offers additional capabilities - passing keyword queries, sorting, and accessing older data with custom time ranges.
Basic Usage
Let's quickly see an example of how to use the search
endpoint, with some filters:
r = requests.post(
f'{NC_ENDPOINT}/api/search',
headers={'x-api-token': NC_API_KEY},
json={
'associated_towns': [{'name': 'New York'}],
'page_size': 5,
'q': 'strike',
'search_in': 'title',
'from_': '30 days ago'
},
)
print(json.dumps(r.json(), indent=2))
The above code sends a POST
request to the /api/search/
endpoint to return 5 articles satisfying the query criteria. There is a keyword query, 'strike' and the time filter from_
is set to last 30 days. Specifying the search_in
parameter as title
will make sure the results contain the word 'strike' in the title of the article. You can also include the content in the search target to return articles that have the word 'strike' in the article content.
Let's see the result:
{
"status": "ok",
"total_hits": 3,
"page": 1,
"total_pages": 1,
"page_size": 5,
"articles": [
{
"id": "7f2ddd19bfc79a2aaa3a2d4173130365",
"associated_town": [
{
"ai_validated": false,
"name": "Batavia, New York",
"description": [
"HYPERLOCAL_SOURCES_EXCLUDE_QUERY"
]
},
{
"ai_validated": true,
"name": "New York",
"description": [
"LOCAL_SOURCES_EXCLUDE_QUERY"
]
}
],
"ai_associated_town": null,
"score": 13.088006,
"title": "US dockworkers agree to suspend strike until Jan. 15",
...response similar to previous section...
Similar to the results from the latest_headlines
endpoint, the response consists of a list of articles. You can see that the returned article has the word 'strike' in the title.
Getting Older Data
With the search
endpoint, you can get older data by specifying from_
and to_
dates. To get articles that are 2 weeks old, use the following JSON body in the request:
{
"associated_towns": [{"name": "Chicago"}],
"from_": "21 days ago",
"to_": "14 days ago"
}
The query above will return the articles from Chicago that are more than 2 weeks but less than 3 weeks old. NewsCatcher also offers the convenience of specifying the dates in natural language format (x days ago), instead of exact date strings.
Querying Multiple Locations
In case you have to query multiple locations, simply pass the list of locations using the associated_towns
parameter. You need not make separate HTTP requests for each location. The JSON body in this case will look like this:
{
"associated_towns": [
{"name": "California"},
{"name": "Texas"}
],
"q": "layoff",
"from_": "30 days ago"
}
The above query will return a list of articles mentioning 'layoff', having an associated location, ‘California’ OR ‘Texas’, from 30 days ago. Searching multiple towns works with the OR operator.
Clustering Articles
With local news, an event is often covered by multiple media outlets. NewsCatcher clusters together similar articles, and provides an option to retrieve article lists as clusters. Let's see how you can use this option:
{
"associated_town": [{"name": "New York"}],
"from_": "2 days ago",
"clustering": true
}
In the above query, the clustering
parameter is set to true
. Let's see what this returns:
{
"status": "ok",
"total_hits": 10,
"page": 1,
"total_pages": 1,
"page_size": 100,
"clusters_count": 9,
"agg_clusters": [],
"clusters": {
"cluster_id_1": {
"articles": [...articles in this cluster...],
"agg_cluster": false,
"original_cluster_size": 2,
"cluster_size": 1
},
"cluster_id_2": {....}
...more clusters...
},
...more data...
From the result, you can see that there are 100 hits, grouped into 9 clusters. The lists of articles are organized inside the clusters. Each article would have data fields similar to those shown in earlier outputs.
Filtering by Sources
With both search
and latest_headlines
endpoints, you can filter the articles by sources. First, to see a list of available sources, use the sources
endpoint:
r = requests.post(
f'{NC_ENDPOINT}/api/sources',
headers={'x-api-token': NC_API_KEY},
json={'lang': 'en'},
)
print(json.dumps(r.json(), indent=2))
The above code returns a list of sources matching the filter lang: en
in the following format:
{
"message": "Maximum sources displayed according to your plan is set to 2000",
"sources": [
"yahoo.com",
"wn.com",
"headtopics.com",
...more sources...
],
"user_input": "the filters passed as input"
}
In the output, there is a list of sources and a message about the maximum number of sources that can be retrieved using this endpoint. You can also see the input parameters that were sent in the request. In addition to the lang
filter, you can use the countries
and theme
filters to get sources by country and article theme respectively. These sources can be used as filters in the queries sent to the search
or latest_headlines
endpoints:
{
"associated_towns": [{"name": "New York"}],
"sources": "yahoo.com",
"not_sources": "iheart.com",
"clustering": true
}
The above query specifies a source 'yahoo.com' using the sources
parameter. The parameter can also take an array, allowing you to specify multiple sources. You can also specify sources to be excluded using the not_sources
parameter. This too can accept a list instead of a string.
Local News API Demo
Conclusion
In this blog, we looked at the freshly launched NewsCatcher Local News API and how to use it. This greatly reduces the effort needed to get localized news for building consumer apps or ingesting into data analysis pipelines. You need not take up the hassle of building and maintaining complex scrapers to gather local news data. NewsCatcher also extracts location names from article text and verifies this using AI, sparing you from using NLP or other text analysis techniques.
We discussed some of the highlight features of this API in this blog. This only scratches the surface of what the API is capable of. Detailed specifications of the API, including comprehensive lists of the filters that can be used, are available in the documentation.