//remvovingautofillcolour

PRODUCT UPDATES

Welcome to the NewsCatcher Product Updates page. Here, you can find the latest news about recent changes.

October 25, 2024

New Documentation Launched: Available at newscatcherapi.com/docs/.

  • Interactive API Reference: Allows direct API testing and exploration without needing code or external tools like Postman.
  • “Talk with the Docs” Feature: Type questions into the search bar for answers and links to relevant documentation.

October 4, 2024

Now users can search for events and related news using new theme: Financial Crime.

Semptember 27, 2024

Implemented domain classification into “news” domain and “non-news” domain categories, with further breakdown of news sources into specific types.

Users can now filter by:

  • is_news_domain : True/False
  • news_domain_type: Aggregator, Original Content, Republisher, Press Releases, Others
  • news_type: Tech News and Updates, Sports News and Blogs, News and Blogs, E-commerce and Product Information, Educational News, Press Releases, Corporate News, Gaming News and Blogs, Entertainment and Media News, Health and Medical News, Government and Municipal News, Real Estate News, Automotive News and Blogs, News Aggregators, Fashion and Lifestyle, Local News and Community Events, Music and Radio, Reviews, Blogs and Magazines, Political News, Non-Profit and Organization News, Event News, General News Outlets, Gambling News, Travel and Lifestyle, Finance and Investment, Specific News Type, Pure News Outlet, Corporate News Section, Other.

August 16, 2024

Analyzed all domains using large language models to categorize them as “news” domain or “non-news” domain and further classified them by content type.

Human verification is still required to ensure accuracy.

August 02, 2024

Expanded Sources List: 

  • We’ve broadened our source coverage by increasing the number of sources from 80,000 to 91,000. 
  • This expansion was driven by feedback from our clients, ensuring a more comprehensive and diverse news feed.

Enhanced Proxy Logic: 

  • We’ve optimized our proxy mechanisms, reducing the number of instances where our data extractors are blocked by 80%. 
  • This improvement ensures more consistent and reliable data extraction across various sources.

July 19, 2024

Preventing Historic Data Breakdowns:

  • To safeguard against data loss and service disruptions, we’ve implemented regular snapshots of our data, stored on AWS Glacier. This allows for quick recovery during downtimes.
  • Additionally, each year of historical data is now duplicated across two servers, ensuring data remains accessible and secure even in the event of a server failure.

Improved System Performance: 

  • We’ve added four new servers to our v3 historic clusters, enhancing data management and overall system performance.

June 14, 2024

New Clustering Algorithm on v3 API: 

  • We’ve introduced a more efficient clustering method for our v3 API and benchmarked it against our existing approach. 
  • The new method is approximately 1.75x faster, offering significant performance improvements without requiring any changes to your existing code or API calls.

June 07, 2024

is_opinion Flag Now Available:

  • The is_opinion attribute, already present in v3, is now available as a filter parameter in the API. This allows for more precise filtering of opinion articles in your data queries.

Improved Source Country Identification:

  • We’ve enhanced our logic for determining the country of origin for news sources. 
  • This update has reduced the number of sources marked as ‘unknown’ by over 5,500, improving the accuracy of geographical data.

May 31, 2024

Translated Articles on v3 API:

  • The v3 API now includes English translations for non-English articles. 
  • We’ve achieved a 90% translation rate for non-English content, providing broader access to global news in English.

May 24, 2024

New English Sentiment Model:

  • We’ve fine-tuned our sentiment analysis model using a synthetic dataset of over a million articles labeled with ChatGPT. 
  • The new model operates 10x faster and delivers improved accuracy, with F-1 scores of 0.89 for non-finance articles and 0.87 for finance-related content.

May 17, 2024

Improved Language Detection:

  • We’ve fixed a bug that caused incorrect language identification due to certain text transformations. This fix enhances the accuracy of our language detection across articles.

Article Deduplication:

  • We’ve implemented a deduplication feature to identify and filter out republished or syndicated articles, ensuring that your data stream focuses on original content.
  • Comprehensive documentation is available for this feature.

May 10, 2024

New English Sentiment Model:

  • We’ve fine-tuned our sentiment analysis model using a synthetic dataset of over a million articles labeled with ChatGPT. 
  • The new model operates 10x faster and delivers improved accuracy, with F-1 scores of 0.89 for non-finance articles and 0.87 for finance-related content.

May 03, 2024

Article Update Monitoring:

  • We’ve introduced a feature that checks whether an article has been updated after its initial publication. 
  • If changes are detected, we ensure the extracted version reflects the latest content, keeping your data current.

April 19, 2024

Enhanced Parent URL Logic:

  • We’ve refined the logic for the parent_url attribute, which previously defaulted to the homepage of the news source where the article was first found. 
  • The new logic now prioritizes section-specific URLs over homepage links, improving the contextual relevance of the parent URL data.

April 12, 2024

Text Formatting Preservation:

  • We’ve improved our text extraction process to preserve formatting, ensuring that more than 90% of articles maintain clear paragraph splits. 
  • This enhancement provides cleaner and more readable data.

V3 API SDKs Launched:

  • We’ve launched SDKs for the v3 API in multiple programming languages, including Python, C#, Java, Go, and TypeScript, making it easier to integrate our API into various development environments.

March 29, 2024

Additional Historical Data: 

  • Our v3 API now includes NLP-enriched articles dating back to the beginning of July 2023. 

Improved latency

  • We’ve deployed a dedicated processing pipeline for priority sources, ensuring that these articles are indexed in under 5 minutes, down from the usual 15-60 minute delay.

March 25, 2024

Author Extraction Enhancement:

  • We’ve improved our extraction methods to better identify author names within the article content, including in-text endings like “…written by John Smith.” 
  • This ensures more accurate attribution of articles to their authors.