Find the dataset on:
We're NewsCatcher team: we collect and index news articles. We provide News API to find relevant news data.
We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)
Dataset
We collected over 100k articles for 8 different news topics
BUSINESS | 15000
ENTERTAINMENT | 15000
HEALTH | 15000
NATION | 15000
SCIENCE | 3774
SPORTS | 15000
TECHNOLOGY | 15000
WORLD | 15000
Those articles got published over the first half of August 2020.
All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.
Other Useful Links
newscatcher Py package - Programmatically collect normalized news from (almost) any website.
pygooglenews - If Google News had a Python library
Support Us
The best you can do for us is to let people know about our News API
Need a bigger dataset?
Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com
100k+ Rows Topic Labeled News Dataset - NewsCatcher