The Newscatcher API limits results to 10,000 articles per search query. The Python SDK provides special methods that automatically split your search across multiple time periods to bypass the limit and retrieve all articles relevant to your query.

These advanced retrieval methods are available only in the Python SDK.

Understanding the article limit

When your query matches more than 10,000 articles, the API returns "total_hits": 10000 as a hard limit, and you cannot retrieve more through standard pagination.

from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

response = client.search.post(
    q="technology",
    from_="7d",
    to="now"
)

print(f"Total hits: {response.total_hits}")
print(f"Is result capped: {response.total_hits == 10000}")  # True if limit reached

Using time-chunking methods

The SDK provides two special methods to retrieve large volumes of articles:

  • get_all_articles
  • get_all_headlines

Both methods available for synchronous and asynchronous clients.

Get all articles

from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

articles = client.get_all_articles(
    q="renewable energy",
    from_="30d",
    to="now",
    time_chunk_size="1d",
    max_articles=50000,
    show_progress=True,
)

print(f"Retrieved {len(articles)} articles")

Get all headlines

headlines = client.get_all_headlines(
    when="30d",
    time_chunk_size="1d",
    max_articles=20000,
    show_progress=True
)

print(f"Retrieved {len(headlines)} headlines")

How time-chunking works

Time-chunking divides your date range into smaller intervals, making separate API calls for each period and combining the results. Each interval can return up to 10,000 articles.

For example, with time_chunk_size="1d" over 5 days, the method makes 5 API calls, one for each day, with auto pagination to potentially retrieve up to 50,000 articles.

Choosing the right chunk size

The optimal chunk size depends on how many articles your query returns:

Query typeArticles per dayRecommended chunk size
Extremely broad10,000+ per hour"1h"
Very broad10,000+ per day"6h"
Broad3,000-10,000 per day"1d"
Moderate1,000-3,000 per day"3d"
Specific100-1,000 per day"7d"
Very specific< 100 per day"30d"

Method parameters

q
string
required

Your search query. Supports AND, OR, NOT operators and advanced syntax.

from_
string
default:"30d"

Starting date for get_all_articles (e.g., "10d" or "2023-03-15").

to
string
default:"now"

Ending date for get_all_articles defaults to current time.

when
string
default:"7d"

Time range for get_all_headlines (e.g., "1d" or "2023-03-15").

time_chunk_size
string
default:"1h"

Chunk size: "1h", "6h", "1d", "7d", "1m".

max_articles
integer
default:"100000"

Maximum number of articles to retrieve.

show_progress
boolean
default:"false"

Whether to display a progress bar.

deduplicate
boolean
default:"true"

Whether to remove duplicate articles.

concurrency
integer
default:"3"

For async methods only: number of concurrent requests.

Common issues and solutions

See also