THE TECH DRIVING OUR API

Discover how our API processes data to deliver unmatched insights.

DISCOVER:
HOW IT WORKS

Intelligent Scheduling Algorithm

Our process begins with a proprietary scheduling algorithm that monitors the publication frequencies of different sources over a week. This data informs our crawlers, allowing us to efficiently gather new article links without overwhelming system resources. This method ensures an optimal balance between timeliness and resource utilization.

Data Acquisition

We fetch and store the raw webpage for each article link. This archival strategy provides the flexibility to enhance data extraction methods retrospectively as new techniques become available, ensuring continuous improvement in data quality.

Extraction Techniques

We utilize five distinct extraction methods to retrieve article data, including two advanced adaptations of open-source technologies and three proprietary techniques developed in-house. This diverse toolkit enables us to handle a wide range of article formats and data types effectively.

Data Integration & Deduplication

After extraction, data from different sources is integrated into a unified article format. Our system applies advanced deduplication techniques, ensuring that each article is unique and consistently formatted, using a combination of URL and an internally generated ID based on various data points. The extraction process particularly focuses on the accuracy of the full article text, publish dates, and author details.

Data Cleaning

The next phase involves a comprehensive data cleaning process. We use a detailed directory of patterns to identify and remove irrelevant information. This meticulous approach significantly enhances the quality of the information.

NLP Pipeline

Cleaned articles are processed through an advanced Natural Language Processing (NLP) pipeline. This stage includes summarizing the content, classifying articles into broad news topics, detecting named entities, and assessing sentiment. This enriches the articles, making them more actionable and insightful for users.

Indexing & Distribution

Processed articles are indexed in our main production ES clusters for querying. We also distribute specific datasets to dedicated client clusters and shared cloud storage to ensure high availability and performance.

Query Processing

Our system dynamically filters and groups articles based on user queries, employing sophisticated algorithms to cluster similar articles and deliver highly relevant results swiftly and efficiently.

Custom Solutions

We continuously develop custom solutions tailored to the unique needs of our clients. This bespoke service is part of our commitment to delivering exceptional value and adapting to the unique challenges faced by our users. Here are some that we have built already.

CUSTOM SOLUTIONS
FOR EVERY NEED

NewsCatcher extends beyond standard offerings to provide customized solutions for diverse enterprise requirements.

Find Out More

Entity Disambiguation

Cut through the clutter with precision - ensure every article pinpoints the exact company or individual you’re tracking.

Insights
Engine

Unearth hidden gems and nurture their growth - our market intelligence shines a spotlight on emerging opportunities awaiting your touch.

Events 
Intelligence

Leverage our global event data stream to stay ahead in the market and turn insights into actionable business strategies

Localised
News

Keep your finger on the pulse of any town or region - our localized news coverage brings you the latest happenings right where they matter.

Trusted by
the Top Leaders

We use NewsCatcher to capture the real-time impact of news stories on corporate credit spreads. We love the rapidly evolving reach and expanding API functionality

Rajiv Bhat

We compute the sum of scores of all retrieved articles for each API option. As a result, NewsCatcher and Google News achieve the highest scores of 35 and 39, respectively. The other three APIs, Newsdata.io, Aylien, and NewsAPI.org, score 16.5, 30.5, and 23.5.

Berkeley University of California

It’s almost like we were a farm-to-table restaurant growing our own vegetables. Then the NewsCatcher guys came in and said, ‘You don’t have to worry about that. Just focus on the kitchen.

Mishaal

Jumping on a call with NewsCatcher proved to be incredibly valuable. The personalized service and tailored solutions made a significant difference, proving that the initial effort to engage with them was well worth it.

Carlos Toruno

After analyzing ratios such as false positives, we found that NewsCatcher had, by far, the best results in terms of availability, quality, and regional focus, making it the clear winner based on our defined KPIs.

Michael

We found NewsCatcher to be a really nice global solution serving our purpose.The Integration of NewsCatcher was seamless. It took us less than four days for both the integration and the testing part of it.

Vedant Lohia