We think we've built the best news parser on the market. Here's why.
We've been building news parsers for over a year. We've tried many open-source solutions, and we know they are not as good as they should be. The biggest problems are:
- published time parsing
- author name parsing
- clear text separation
We continually update our internal news parsers for our own backend for News API. As a result, we came up with something which consistently returns clear results where it is possible.
The code is written in Python and JS. Our API can serve hundreds of concurrent requests.
From each URL of an online-published news article, we can extract:
non-opinion[English articles only]
Twitter account of publisher
full text of the article
Go ahead and try it yourself