← Back to blog

100k+ Rows Topic Labeled News Dataset - NewsCatcher

All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.

Find the dataset on:

We're NewsCatcher team: we collect and index news articles. We provide News API to find relevant news data.

We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)

Dataset

We collected over 100k articles for 8 different news topics

BUSINESS  |       15000

ENTERTAINMENT  |  15000

HEALTH      |     15000

NATION      |     15000

SCIENCE     |      3774

SPORTS       |    15000

TECHNOLOGY   |    15000

WORLD     |       15000

Those articles got published over the first half of August 2020.

All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.

Other Useful Links

newscatcher Py package - Programmatically collect normalized news from (almost) any website.

pygooglenews - If Google News had a Python library

Support Us

The best you can do for us is to let people know about our News API

Need a bigger dataset?

Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com

100k+ Rows Topic Labeled News Dataset - NewsCatcher