How to retrieve more than 10,000 articles
Learn how to use time-chunking methods in the Python SDK to retrieve large volumes of articles
The Newscatcher API limits results to 10,000 articles per search query. The Python SDK provides special methods that automatically split your search across multiple time periods to bypass the limit and retrieve all articles relevant to your query.
These advanced retrieval methods are available only in the Python SDK.
Understanding the article limit
When your query matches more than 10,000 articles, the API returns
"total_hits": 10000
as a hard limit, and you cannot retrieve more through
standard pagination.
Using time-chunking methods
The SDK provides two special methods to retrieve large volumes of articles:
get_all_articles
get_all_headlines
Both methods available for synchronous and asynchronous clients.
Get all articles
Get all headlines
How time-chunking works
Time-chunking divides your date range into smaller intervals, making separate API calls for each period and combining the results. Each interval can return up to 10,000 articles.
For example, with time_chunk_size="1d"
over 5 days, the method makes 5 API
calls, one for each day, with auto pagination to potentially retrieve up to
50,000 articles.
Choosing the right chunk size
The optimal chunk size depends on how many articles your query returns:
Query type | Articles per day | Recommended chunk size |
---|---|---|
Extremely broad | 10,000+ per hour | "1h" |
Very broad | 10,000+ per day | "6h" |
Broad | 3,000-10,000 per day | "1d" |
Moderate | 1,000-3,000 per day | "3d" |
Specific | 100-1,000 per day | "7d" |
Very specific | < 100 per day | "30d" |
Method parameters
Your search query. Supports AND, OR, NOT operators and advanced syntax.
Starting date for get_all_articles
(e.g., "10d"
or "2023-03-15"
).
Ending date for get_all_articles
defaults to current time.
Time range for get_all_headlines
(e.g., "1d"
or "2023-03-15"
).
Chunk size: "1h"
, "6h"
, "1d"
, "7d"
, "1m"
.
Maximum number of articles to retrieve.
Whether to display a progress bar.
Whether to remove duplicate articles.
For async methods only: number of concurrent requests.
Common issues and solutions
See also
Was this page helpful?