100k+ Rows Topic Labeled News Dataset

100k+ Rows Topic Labeled News Dataset

Find the dataset on:

We're NewsCatcher team: we collect and index news articles. We provide News API to find relevant news data.

We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)

Dataset

We collected over 100k articles for 8 different news topics

BUSINESS | 15000

ENTERTAINMENT | 15000

HEALTH | 15000

NATION | 15000

SCIENCE | 3774

SPORTS | 15000

TECHNOLOGY | 15000

WORLD | 15000

Those articles got published over the first half of August 2020.

All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.

Other Useful Links

newscatcher Py package - Programmatically collect normalized news from (almost) any website.

pygooglenews - If Google News had a Python library

Support Us

The best you can do for us is to let people know about our News API

Need a bigger dataset?

Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com