# Get aggregation count by interval Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/aggregation/get-aggregation-count-by-interval-get news-api-v3 get /api/aggregation_count Retrieves the count of articles aggregated by day or hour based on various search criteria, such as keyword, language, country, and source. # Get aggregation count by interval Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/aggregation/get-aggregation-count-by-interval-post news-api-v3 post /api/aggregation_count Retrieves the count of articles aggregated by day or hour based on various search criteria, such as keyword, language, country, and source. # Search articles by author Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/authors/search-articles-by-author-get news-api-v3 get /api/authors Searches for articles written by a specified author. You can filter results by language, country, source, and more. # Search articles by author Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/authors/search-articles-by-author-post news-api-v3 post /api/authors Searches for articles by author. You can filter results by language, country, source, and more. # Get breaking news Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/breaking-news/retrieve-breaking-news-get news-api-v3 get /api/breaking_news Retrieves breaking news articles and sorts them based on specified criteria. # Get breaking news Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/breaking-news/retrieve-breaking-news-post news-api-v3 post /api/breaking_news Retrieves breaking news articles and sorts them based on specified criteria. # Retrieve latest headlines Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/latest-headlines/retrieve-latest-headlines-get news-api-v3 get /api/latest_headlines Retrieves the latest headlines for the specified time period. You can filter results by language, country, source, and more. # Retrieve latest headlines Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/latest-headlines/retrieve-latest-headlines-post news-api-v3 post /api/latest_headlines Retrieves the latest headlines for the specified time period. You can filter results by language, country, source, and more. # Search articles by links or IDs Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search-by-link/search-articles-by-links-or-ids-get news-api-v3 get /api/search_by_link Searches for articles based on specified links or IDs. You can filter results by date range. # Search articles by links or IDs Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search-by-link/search-articles-by-links-or-ids-post news-api-v3 post /api/search_by_link Searches for articles using their ID(s) or link(s). # Search similar articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search-similar/search-similar-articles-get news-api-v3 get /api/search_similar Searches for articles similar to a specified query. # Search similar articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search-similar/search-similar-articles-post news-api-v3 post /api/search_similar Searches for articles similar to the specified query. You can filter results by language, country, source, and more. # Search articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search/search-articles-get news-api-v3 get /api/search Searches for articles based on specified criteria such as keyword, language, country, source, and more. # Search articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/search/search-articles-post news-api-v3 post /api/search Searches for articles based on specified criteria such as keyword, language, country, source, and more. # Retrieve sources Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/sources/retrieve-sources-get news-api-v3 get /api/sources Retrieves a list of sources based on specified criteria such as language, country, rank, and more. # Retrieve sources Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/sources/retrieve-sources-post news-api-v3 post /api/sources Retrieves the list of sources available in the database. You can filter the sources by language, country, and more. # Retrieve subscription plan information Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/subscription/retrieve-subscription-plan-information-get news-api-v3 get /api/subscription Retrieves information about your subscription plan. # Retrieve subscription plan information Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/endpoints/subscription/retrieve-subscription-plan-information-post news-api-v3 post /api/subscription Retrieves information about your subscription plan. # C# SDK Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/csharp C# client library for News API v3 C# SDK provides access to the News API v3 from C# applications. ## Installation ```bash # Using the .NET CLI dotnet add package NewscatcherApi # Using NuGet Package Manager Console Install-Package NewscatcherApi ``` ## Basic usage ```csharp using NewscatcherApi; var client = new NewscatcherApiClient("YOUR_API_KEY"); await client.Search.PostAsync( new SearchPostRequest { Q = "renewable energy", PredefinedSources = new List() { "top 50 US" }, Lang = new List() { "en" }, From = new DateTime(2024, 01, 01, 00, 00, 00, 000), To = new DateTime(2024, 06, 30, 00, 00, 00, 000), AdditionalDomainInfo = true, IsNewsDomain = true, } ); ``` ## Error handling ```csharp using NewscatcherApi; try { var response = await client.Search.PostAsync(...); } catch (NewscatcherApiApiException e) { Console.WriteLine(e.Body); Console.WriteLine(e.StatusCode); } ``` For complete documentation, including retry configuration and timeouts, see the [GitHub repository](https://github.com/Newscatcher/newscatcher-csharp). If you use our legacy C# SDK (Konfig-based), see our [Legacy SDKs](/v3/api-reference/libraries/legacy) documentation. We recommend migrating to this newer SDK for improved features and ongoing support. ## Resources * [GitHub repository](https://github.com/Newscatcher/newscatcher-csharp) * [NuGet package](https://nuget.org/packages/NewscatcherApi) # Go SDK Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/go Go client library for News API v3 Go SDK provides access to the News API v3 from Go applications. ## Installation ```bash go get github.com/Newscatcher/newscatcher-go ``` ## Basic usage ```go import ( "context" newscatcher "github.com/Newscatcher/newscatcher-go" newscatcherclient "github.com/Newscatcher/newscatcher-go/client" "github.com/Newscatcher/newscatcher-go/option" ) client := newscatcherclient.NewClient( option.WithApiKey("YOUR_API_KEY"), ) response, err := client.Search.Post( context.TODO(), &newscatcher.SearchPostRequest{ Q: "renewable energy", PredefinedSources: []string{"top 50 US"}, Lang: []string{"en"}, From: "2024-01-01", To: "2024-06-30", }, ) ``` ## Error handling ```go import ( "github.com/Newscatcher/newscatcher-go/core" "fmt" ) response, err := client.Search.Post(...) if err != nil { if apiErr, ok := err.(*core.APIError); ok { fmt.Println(apiErr.Error()) fmt.Println(apiErr.StatusCode) } return err } ``` For complete documentation, including specific error types, retry configuration, and timeouts, see the [GitHub repository](https://github.com/Newscatcher/newscatcher-go). If you use our legacy Go SDK (Konfig-based), see our [Legacy SDKs](/v3/api-reference/libraries/legacy) documentation. We recommend migrating to this newer SDK for improved features and ongoing support. ## Resources * [GitHub repository](https://github.com/Newscatcher/newscatcher-go) * [Go package](https://pkg.go.dev/github.com/Newscatcher/newscatcher-go) # Java SDK Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/java Java client library for News API v3 Java SDK provides access to the News API v3 from Java or Kotlin applications. ## Installation ### Gradle ```groovy dependencies { implementation 'com.newscatcherapi:newscatcher-sdk:1.1.0' } ``` ### Maven ```xml com.newscatcherapi newscatcher-sdk 1.1.0 ``` ## Basic usage ```java import com.newscatcher.api.NewscatcherApiClient; import com.newscatcher.api.resources.search.requests.SearchPostRequest; import java.util.Arrays; NewscatcherApiClient client = NewscatcherApiClient.builder() .apiKey("YOUR_API_KEY") .build(); client.search().post(SearchPostRequest.builder() .q("renewable energy") .lang(Arrays.asList("en")) .build()); ``` ## Error handling ```java import com.newscatcher.api.core.NewscatcherApiApiException; try { client.search().post(...); } catch (NewscatcherApiApiException e) { System.out.println(e.getMessage()); System.out.println(e.statusCode()); System.out.println(e.body()); } ``` For complete documentation, including specific error types and configuration options, see the [GitHub repository](https://github.com/Newscatcher/newscatcher-java). If you use our legacy Java SDK (Konfig-based), see our [Legacy SDKs](/v3/api-reference/libraries/legacy) documentation. We recommend migrating to this newer SDK for improved features and ongoing support. ## Resources * [GitHub repository](https://github.com/Newscatcher/newscatcher-java) * [Maven Central](https://central.sonatype.com/artifact/com.newscatcherapi/newscatcher-sdk) # Legacy SDKs for News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy Documentation for deprecated NewsCatcher API client libraries **Deprecation notice** The SDKs listed here are deprecated and no longer actively maintained or supported. We strongly recommend using our [new SDKs](/v3/documentation/get-started/libraries) for all new projects and migrating existing implementations when possible. ## About This page provides access to the documentation for our legacy SDKs produced by Konfig. We have since transitioned to using Fern for SDK generation, which provides an improved developer experience, better type safety, and additional features. You can find source code for the legacy Konfig SDKs in [this GitHub repository](https://github.com/Newscatcher/newscatcher-sdks) ## Legacy SDK documentation Legacy Python client library for News API v3 Legacy TypeScript client library for News API v3 Legacy Java client library for News API v3 Legacy Go client library for News API v3 Legacy C# client library for News API v3 ## Migration tips When migrating from a legacy SDK to our new SDKs: 1. **Installation:** Install the new SDK through the appropriate package manager 2. **Method changes:** The new SDKs maintain similar method names but may have slightly different parameter naming 3. **Error handling:** Update your error handling to use the new error types 4. **Imports:** Update your import statements to reference the new package names # C# SDK for NewsCatcher News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy/csharp .NET client library for the NewsCatcher News API A C# SDK for the NewsCatcher News API v3, offering full .NET integration, async/await support, and comprehensive exception handling with platform-specific optimizations. ## Requirements * .NET Core >=1.0 * .NET Framework >=4.6 * Mono/Xamarin >=vNext ## Installation ### Using the .NET CLI ```bash dotnet add package Newscatcherapi.Net ``` ### Using NuGet Package Manager Console ```powershell Install-Package Newscatcherapi.Net ``` ### Using Package Manager UI in Visual Studio 1. Right-click on your project in Solution Explorer. 2. Select "Manage NuGet Packages". 3. Search for "Newscatcherapi.Net". 4. Click Install. ## Core features ### Initialize client ```csharp using Newscatcherapi.Net.Client; var client = new NewscatcherClient(); client.SetApiKey("YOUR_API_KEY"); ``` ### Search articles ```csharp // Regular search try { var result = await client.Search.Get( q: "technology", lang: "en", includeNlpData: true ); Console.WriteLine($"Found {result.TotalHits} articles"); } catch (ApiException e) { Console.WriteLine($"Error: {e.ErrorCode} - {e.Message}"); } // Clustered search var clusterResults = await client.Search.Get( q: "AI technology", lang: "en", clusteringEnabled: true, clusteringThreshold: 0.6, includeNlpData: true ); ``` ### Latest headlines ```csharp var headlines = await client.LatestHeadlines.Get( lang: "en", countries: "US", clusteringEnabled: true, includeNlpData: true ); ``` ### Author search ```csharp var authorArticles = await client.Authors.Get( authorName: "Sam Altman", includeNlpData: true ); ``` ### Similar articles ```csharp var similar = await client.SearchSimilar.Get( q: "SpaceX launch", includeNlpData: true ); ``` ### Get sources ```csharp var sources = await client.Sources.Get( lang: "en" ); ``` ### Check subscription ```csharp var subscription = await client.Subscription.Get(); ``` ## Advanced features ### Error handling The SDK provides detailed error information through `ApiException`: ```csharp try { var result = await client.Search.Get(q: "tech news"); } catch (ApiException e) { Console.WriteLine($"Error code: {e.ErrorCode}"); Console.WriteLine($"Error content: {e.ErrorContent}"); Console.WriteLine($"Headers: {e.Headers}"); Console.WriteLine($"Response details: {e.Message}"); } catch (Exception e) { Console.WriteLine($"General error: {e.Message}"); } ``` ### Cancellation support All async operations support cancellation tokens: ```csharp using var cts = new CancellationTokenSource(); cts.CancelAfter(TimeSpan.FromSeconds(5)); // 5-second timeout try { var result = await client.Search.GetAsync( q: "tech", cancellationToken: cts.Token ); } catch (OperationCanceledException) { Console.WriteLine("Request timed out"); } ``` ## Utilities ### Rate limit handler ```csharp public static class RateLimitHandler { public static async Task WithRetry( Func> operation, int maxRetries = 3, int baseDelay = 1000) { for (int i = 0; i < maxRetries; i++) { try { return await operation(); } catch (ApiException e) when (e.ErrorCode == 429) { if (i == maxRetries - 1) throw; var delay = baseDelay * Math.Pow(2, i); await Task.Delay((int)delay); } } throw new Exception("Max retries exceeded"); } } // Usage var result = await RateLimitHandler.WithRetry(async () => await client.Search.Get(q: "tech") ); ``` ### Pagination handler ```csharp public static class PaginationHandler { public static async Task> GetAllResults( NewscatcherClient client, string query, int maxPages = 5) { var results = new List
(); for (int page = 1; page <= maxPages; page++) { var response = await client.Search.Get( q: query, page: page, pageSize: 100 ); results.AddRange(response.Articles); if (page >= response.TotalPages) break; } return results; } } // Usage var allArticles = await PaginationHandler.GetAllResults( client, "AI technology", maxPages: 2 ); ``` ### HTTP client configuration ```csharp var config = new Configuration { UserAgent = "CustomUserAgent/1.0", Timeout = 30000, // 30 seconds BasePath = "https://your-proxy.com/v3-api.newscatcherapi.com" }; var client = new NewscatcherClient(config); ``` ## Additional resources * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) * [NuGet Package](https://www.nuget.org/packages/Newscatcherapi.Net) # Go SDK for NewsCatcher News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy/go Idiomatic Go client for the NewsCatcher News API A Go SDK for the NewsCatcher News API v3, featuring idiomatic Go patterns, context support, and efficient error handling with strong type safety. ## Requirements * Go 1.18 or higher ## Installation Add the SDK to your project using Go modules: ```bash go get github.com/konfig-dev/newscatcher-go-sdk ``` ## Core features ### Initialize client ```go import ( "fmt" newscatcherapi "github.com/konfig-dev/newscatcher-go-sdk" ) configuration := newscatcherapi.NewConfiguration() configuration.SetApiKey("YOUR_API_KEY") client := newscatcherapi.NewAPIClient(configuration) ``` ### Search articles ```go // Regular search request := client.SearchApi.Get(context.Background()) request.Q("technology") request.Lang("en") request.IncludeNlpData(true) result, response, err := request.Execute() if err != nil { fmt.Printf("Error: %v\n", err) return } // Clustered search clusterRequest := client.SearchApi.Get(context.Background()) clusterRequest.Q("AI technology") clusterRequest.Lang("en") clusterRequest.ClusteringEnabled(true) clusterRequest.ClusteringThreshold(0.6) clusterRequest.IncludeNlpData(true) clusterResult, response, err := clusterRequest.Execute() ``` ### Latest headlines ```go request := client.LatestHeadlinesApi.Get(context.Background()) request.Lang("en") request.Countries("US") request.ClusteringEnabled(true) request.IncludeNlpData(true) headlines, response, err := request.Execute() ``` ### Author search ```go request := client.AuthorsApi.Get(context.Background(), "Sam Altman") request.IncludeNlpData(true) authorArticles, response, err := request.Execute() ``` ### Similar articles ```go request := client.SearchSimilarApi.Get(context.Background()) request.Q("SpaceX launch") request.IncludeNlpData(true) similar, response, err := request.Execute() ``` ### Get sources ```go request := client.SourcesApi.Get(context.Background()) request.Lang("en") sources, response, err := request.Execute() ``` ### Check subscription ```go subscription, response, err := client.SubscriptionApi.Get(context.Background()).Execute() ``` ## Advanced features ### HTTP response handling The SDK provides detailed HTTP response information: ```go request := client.SearchApi.Get(context.Background()) request.Q("tech") result, response, err := request.Execute() if err != nil { fmt.Printf("Error: %v\n", err) fmt.Printf("Response: %v\n", response) fmt.Printf("Status code: %v\n", response.StatusCode) fmt.Printf("Headers: %v\n", response.Header) return } ``` ### Context support All API operations support context for timeout and cancellation: ```go ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() request := client.SearchApi.Get(ctx) result, response, err := request.Execute() ``` ## Error handling The SDK uses `GenericOpenAPIError` for comprehensive error handling: ```go request := client.SearchApi.Get(context.Background()) request.Q("tech news") result, response, err := request.Execute() if err != nil { if apiError, ok := err.(*newscatcherapi.GenericOpenAPIError); ok { fmt.Printf("Error body: %v\n", string(apiError.Body())) fmt.Printf("Error model: %v\n", apiError.Model()) } fmt.Printf("Full error: %v\n", err) return } ``` ## Utilities ### Rate limit handler ```go func withRetry(operation func() error, maxRetries int, delay time.Duration) error { var lastErr error for i := 0; i < maxRetries; i++ { err := operation() if err == nil { return nil } if apiErr, ok := err.(*newscatcherapi.GenericOpenAPIError); ok { // Check if it's a rate limit error if strings.Contains(string(apiErr.Body()), "429") { time.Sleep(delay * time.Duration(math.Pow(2, float64(i)))) lastErr = err continue } } return err } return fmt.Errorf("max retries exceeded: %v", lastErr) } // Usage request := client.SearchApi.Get(context.Background()) request.Q("tech") err := withRetry(func() error { _, _, err := request.Execute() return err }, 3, time.Second) ``` ### Pagination handler ```go func getAllResults(client *newscatcherapi.APIClient, query string, maxPages int) ([]Article, error) { var results []Article for page := 1; page <= maxPages; page++ { request := client.SearchApi.Get(context.Background()) request.Q(query) request.Page(page) request.PageSize(100) response, _, err := request.Execute() if err != nil { return nil, err } results = append(results, response.Articles...) if page >= response.TotalPages { break } } return results, nil } ``` ## Additional resources * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) * [Go Package](https://pkg.go.dev/github.com/konfig-dev/newscatcher-go-sdk/v6) # Java SDK for NewsCatcher News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy/java Java client library for the NewsCatcher News API with Android support A Java SDK for the NewsCatcher News API v3, featuring robust error handling, comprehensive Android support, and flexible configuration options for enterprise applications. ## Requirements * Java 1.8+ * Maven (3.8.3+)/Gradle (7.2+) ### Android requirements If using this library in an Android application: * Android 8.0+ (API Level 26+) ## Installation ### Maven users Add this dependency to your project's POM: ```xml com.konfigthis.newscatcherapi newscatcherapi-java-sdk 6.0.13 compile ``` ### Gradle users Add this dependency to your `build.gradle`: ```groovy // build.gradle repositories { mavenCentral() } dependencies { implementation "com.konfigthis.newscatcherapi:newscatcherapi-java-sdk:6.0.13" } ``` ### Android configuration 1. Set minimum SDK version in your `build.gradle`: ```groovy android { defaultConfig { minSdk 26 } } ``` 2. Add internet permissions to your `AndroidManifest.xml`: ```xml ``` ### Manual installation Generate the JAR by executing: ```bash mvn clean package ``` Then manually install the following JARs: * `target/newscatcherapi-java-sdk-6.0.13.jar` * `target/lib/*.jar` ## Core features ### Initialize client ```java import com.konfigthis.newscatcherapi.client.Newscatcher; import com.konfigthis.newscatcherapi.client.Configuration; Configuration configuration = new Configuration(); configuration.host = "https://v3-api.newscatcherapi.com"; configuration.apiKey = "YOUR_API_KEY"; Newscatcher client = new Newscatcher(configuration); ``` ### Search articles ```java // Regular search SearchResponse result = client.search.get("technology") .lang("en") .includeNlpData(true) .execute(); // Clustered search SearchResponse clusterResults = client.search.get("AI technology") .lang("en") .clusteringEnabled(true) .clusteringThreshold(0.6) .includeNlpData(true) .execute(); ``` ### Latest headlines ```java LatestHeadlinesResponse headlines = client.latestHeadlines.get() .lang("en") .countries("US") .clusteringEnabled(true) .includeNlpData(true) .execute(); ``` ### Author search ```java AuthorResponse authorArticles = client.authors.get("Sam Altman") .includeNlpData(true) .execute(); ``` ### Similar articles ```java SimilarResponse similar = client.searchSimilar.get("SpaceX launch") .includeNlpData(true) .execute(); ``` ### Get sources ```java SourceResponse sources = client.sources.get() .lang("en") .execute(); ``` ### Check subscription ```java SubscriptionResponse subscription = client.subscription.get().execute(); ``` ## Advanced features ### HTTP response access Access detailed HTTP response information: ```java ApiResponse response = client.search.get("tech") .executeWithHttpInfo(); System.out.println("Status code: " + response.getStatusCode()); System.out.println("Headers: " + response.getHeaders()); System.out.println("Round trip time: " + response.getRoundTripTime()); ``` ### SSL configuration ```java Configuration config = new Configuration(); config.setVerifyingSsl(true); config.setSslCaCert(yourCertificateInputStream); Newscatcher client = new Newscatcher(config); ``` ## Error handling The SDK uses `ApiException` for comprehensive error handling: ```java try { SearchResponse result = client.search.get("tech news").execute(); } catch (ApiException e) { System.err.println("Status code: " + e.getStatusCode()); System.err.println("Response body: " + e.getResponseBody()); System.err.println("Headers: " + e.getResponseHeaders()); System.err.println("Round trip time: " + e.getRoundTripTime()); } ``` ## Utilities ### Rate limit handler ```java public class RateLimitHandler { public static T withRetry(Callable operation, int maxRetries, long delay) throws Exception { int retries = 0; while (true) { try { return operation.call(); } catch (ApiException e) { if (e.getStatusCode() == 429 && retries < maxRetries) { retries++; Thread.sleep(delay * (long)Math.pow(2, retries)); continue; } throw e; } } } } // Usage SearchResponse result = RateLimitHandler.withRetry( () -> client.search.get("tech").execute(), 3, 1000 ); ``` ### Pagination handler ```java public class PaginationHandler { public static List
getAllResults(Newscatcher client, String query, int maxPages) throws ApiException { List
results = new ArrayList<>(); for (int page = 1; page <= maxPages; page++) { SearchResponse response = client.search.get(query) .page(page) .pageSize(100) .execute(); results.addAll(response.getArticles()); if (page >= response.getTotalPages()) break; } return results; } } ``` ## Android-specific considerations ### Threading When making API calls on Android, ensure you're not on the main thread: ```java // Using Android Coroutines private suspend fun searchArticles() { withContext(Dispatchers.IO) { try { val result = client.search.get("tech news").execute() // Handle result } catch (e: ApiException) { // Handle error } } } // Using AsyncTask (legacy) new AsyncTask() { @Override protected SearchResponse doInBackground(Void... params) { try { return client.search.get("tech news").execute(); } catch (ApiException e) { // Handle error return null; } } }.execute(); ``` ### Memory management For Android applications, consider implementing a caching strategy: ```java public class ResponseCache { private static final long CACHE_DURATION = TimeUnit.MINUTES.toMillis(5); private final Map cache = new HashMap<>(); public synchronized SearchResponse getOrFetch( Newscatcher client, String query, long maxAge ) throws ApiException { CachedResponse cached = cache.get(query); if (cached != null && !cached.isExpired(maxAge)) { return cached.response; } SearchResponse fresh = client.search.get(query).execute(); cache.put(query, new CachedResponse(fresh)); return fresh; } private static class CachedResponse { final SearchResponse response; final long timestamp; CachedResponse(SearchResponse response) { this.response = response; this.timestamp = System.currentTimeMillis(); } boolean isExpired(long maxAge) { return System.currentTimeMillis() - timestamp > maxAge; } } } ``` ## Additional resources * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) * [Maven Central](https://central.sonatype.com/artifact/com.konfigthis.newscatcherapi/newscatcherapi-java-sdk) # Python SDK for NewsCatcher News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy/python Python client library for the NewsCatcher News API with async support A Python SDK for the NewsCatcher News API v3, providing both synchronous and asynchronous APIs, intuitive error handling, and native raw HTTP response access. ## Requirements * Python 3.7 or higher ## Installation ```bash pip install newscatcherapi-python-sdk ``` ## Core features ### Initialize client ```python from newscatcherapi_client import Newscatcher newscatcher = Newscatcher( api_key="YOUR_API_KEY" ) ``` ### Search articles ```python # Regular search results = newscatcher.search.get( q="technology", lang="en", include_nlp_data=True # optional, adds NLP analysis layer ) # Clustered search cluster_results = newscatcher.search.get( q="AI technology", lang="en", clustering_enabled=True, clustering_threshold=0.6, include_nlp_data=True ) ``` ### Latest headlines ```python headlines = newscatcher.latest_headlines.get( lang="en", countries="US", clustering_enabled=True, include_nlp_data=True ) ``` ### Author search ```python author_articles = newscatcher.authors.get( author_name="Sam Altman", include_nlp_data=True ) ``` ### Similar articles ```python similar = newscatcher.search_similar.get( q="SpaceX launch", include_nlp_data=True ) ``` ### Sources ```python sources = newscatcher.sources.get( lang="en" ) ``` ### Subscription ```python subscription = newscatcher.subscription.get() ``` ## Advanced features ### Raw HTTP response Access raw HTTP response data using the `.raw` namespace: ```python raw_response = newscatcher.search.raw.get( q="tech", ) print(raw_response.body) print(raw_response.headers) print(raw_response.status) print(raw_response.round_trip_time) ``` ### Async support The SDK supports asynchronous operations. Prepend `a` to any method name to use async: ```python import asyncio async def main(): results = await newscatcher.search.aget( q="tech", lang="en" ) print(results) asyncio.run(main()) ``` ## Error handling The SDK uses `ApiException` for error handling: ```python try: response = newscatcher.search.get(q="tech news") except ApiException as e: print(f"Status: {e.status}") print(f"Reason: {e.reason}") print(f"Headers: {e.headers}") print(f"Response time: {e.round_trip_time}") ``` ## Utilities ### Rate limit handler ```python import asyncio from functools import wraps async def with_retry(operation, max_retries=3, delay=1): for i in range(max_retries): try: return await operation() except ApiException as e: if e.status == 429 and i < max_retries - 1: await asyncio.sleep(delay * (2 ** i)) continue raise raise Exception("Max retries exceeded") # Usage results = await with_retry( lambda: newscatcher.search.aget(q="tech") ) ``` ### Pagination handler ```python async def get_all_results(newscatcher, query, max_pages=5): results = [] for page in range(1, max_pages + 1): response = await newscatcher.search.aget( q=query, page=page, page_size=100 ) data = response if hasattr(data, 'clusters'): articles = [ article for cluster in data.clusters for article in cluster.articles ] else: articles = data.articles results.extend(articles) if page >= data.total_pages: break return results ``` ## Additional resources * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) * [PyPI Package](https://pypi.org/project/newscatcherapi-python-sdk/) # TypeScript SDK for NewsCatcher News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/legacy/typescript Type-safe TypeScript/JavaScript client for the NewsCatcher News API A TypeScript SDK for the NewsCatcher News API v3, offering full type safety, modern async/await patterns, and seamless integration with Node.js and browser environments. ## Requirements * Node.js 14 or higher * TypeScript 4.5 or higher ## Installation ```bash npm npm install newscatcherapi-typescript-sdk ``` ```bash pnpm pnpm i newscatcherapi-typescript-sdk ``` ```bash yarn yarn add newscatcherapi-typescript-sdk ``` ## Core features ### Initialize client ```typescript import { Newscatcher } from "newscatcherapi-typescript-sdk"; const newscatcher = new Newscatcher({ apiKey: "YOUR_API_KEY", }); ``` ### Search articles ```typescript // Regular search const searchResults = await newscatcher.search.get({ q: "technology", lang: "en", includeNlpData: true, // optional, adds NLP analysis layer }); // Clustered search const clusterResults = await newscatcher.search.get({ q: "AI technology", lang: "en", clusteringEnabled: true, clusteringThreshold: 0.6, includeNlpData: true, }); ``` ### Latest headlines ```typescript const headlines = await newscatcher.latestHeadlines.get({ lang: "en", countries: "US", clusteringEnabled: true, includeNlpData: true, }); ``` ### Author search ```typescript const authorArticles = await newscatcher.authors.get({ authorName: "Sam Altman", includeNlpData: true, }); ``` ### Similar articles ```typescript const similar = await newscatcher.searchSimilar.get({ q: "SpaceX launch", includeNlpData: true, }); ``` ### Sources ```typescript const sources = await newscatcher.sources.get({ lang: "en", }); ``` ### Subscription ```typescript const subscription = await newscatcher.subscription.get(); ``` ## Error handling The SDK provides typed error handling using the `NewscatcherError` class: ```typescript try { const response = await newscatcher.search.get({ q: "tech news", }); } catch (error) { if (error instanceof NewscatcherError) { console.error(`Status: ${error.status}`); console.error(`Message: ${error.message}`); console.error(`Response Body:`, error.responseBody); } } ``` ## Utilities ### Rate limit handler ```typescript async function withRetry( operation: () => Promise, maxRetries = 3, delay = 1000 ): Promise { for (let i = 0; i < maxRetries; i++) { try { return await operation(); } catch (error) { if ( error instanceof NewscatcherError && error.status === 429 && i < maxRetries - 1 ) { await new Promise((resolve) => setTimeout(resolve, delay * Math.pow(2, i)) ); continue; } throw error; } } throw new Error("Max retries exceeded"); } ``` ### Pagination handler ```typescript async function getAllResults( newscatcher: Newscatcher, query: string, maxPages = 5 ) { const results = []; for (let page = 1; page <= maxPages; page++) { const response = await newscatcher.search.get({ q: query, page, pageSize: 100, }); const data = response.data; if ("clusters" in data && Array.isArray(data.clusters)) { data.clusters.forEach((cluster) => { if (cluster.articles) { results.push(...cluster.articles); } }); } else if ("articles" in data) { results.push(...data.articles); } if (page >= data.total_pages) break; } return results; } ``` ## Additional resources * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) * [NPM Package](https://www.npmjs.com/package/newscatcherapi-typescript-sdk) # Python SDK Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/python Python client library for News API v3 Python SDK provides access to the News API v3 from Python applications with support for both synchronous and asynchronous operations. ## Installation ```bash pip install newscatcher-sdk ``` ## Basic usage ```python import datetime from newscatcher import NewscatcherApi client = NewscatcherApi( api_key="YOUR_API_KEY", ) # Search for articles client.search.post( q="renewable energy", predefined_sources=["top 50 US"], lang=["en"], from_=datetime.datetime.fromisoformat("2024-01-01 00:00:00+00:00"), to=datetime.datetime.fromisoformat("2024-06-30 00:00:00+00:00"), additional_domain_info=True, is_news_domain=True, ) ``` ## Async usage ```python import asyncio import datetime from newscatcher import AsyncNewscatcherApi client = AsyncNewscatcherApi( api_key="YOUR_API_KEY", ) async def main() -> None: await client.search.post( q="renewable energy", predefined_sources=["top 50 US"], lang=["en"], from_=datetime.datetime.fromisoformat("2024-01-01 00:00:00+00:00"), to=datetime.datetime.fromisoformat("2024-06-30 00:00:00+00:00"), additional_domain_info=True, is_news_domain=True, ) asyncio.run(main()) ``` ## Retrieving more then 10,000 articles The SDK provides methods to automatically retrieve more than the standard 10,000 article limit: ```python # Get articles about renewable energy from the past 10 days articles = client.get_all_articles( q="renewable energy", from_="10d", # Last 10 days time_chunk_size="1d", # Split into 1-day chunks max_articles=50000, # Limit to 50,000 articles show_progress=True # Show progress indicator ) ``` To learn more about the custom methods allowing to bypass the API limits, see [How to retrieve more than 10,000 articles](/v3/documentation/how-to/retrieve-more-than-10k-articles). ## Error handling ```python from newscatcher.core.api_error import ApiError try: client.search.post(...) except ApiError as e: print(e.status_code) print(e.body) ``` If you use our legacy Python SDK (Konfig-based), see our [Legacy SDKs](/v3/api-reference/libraries/legacy) documentation. We recommend migrating to this newer SDK for improved features and ongoing support. ## Resources * [GitHub Repository](https://github.com/Newscatcher/newscatcher-python) * [PyPI Package](https://pypi.org/project/newscatcher-sdk/) # TypeScript SDK Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/libraries/typescript TypeScript client library for News API v3 TypeScript SDK provides access to the News API v3 from TypeScript or JavaScript applications. ## Installation ```bash # npm npm install newscatcher-sdk # yarn yarn add newscatcher-sdk # pnpm pnpm add newscatcher-sdk ``` ## Basic usage ```typescript import { NewscatcherApiClient } from "newscatcher-sdk"; const client = new NewscatcherApiClient({ apiKey: "YOUR_API_KEY" }); // Search for articles await client.search.post({ q: "renewable energy", predefinedSources: ["top 50 US"], lang: ["en"], from: new Date("2024-01-01T00:00:00.000Z"), to: new Date("2024-06-30T00:00:00.000Z"), additionalDomainInfo: true, isNewsDomain: true, }); ``` ## Error handling ```typescript import { NewscatcherApiError } from "newscatcher-sdk"; try { await client.search.post({ q: "renewable energy", }); } catch (err) { if (err instanceof NewscatcherApiError) { console.log(err.statusCode); console.log(err.message); console.log(err.body); } } ``` For complete documentation, including request/response types, retry configuration, and timeouts, see the [GitHub repository](https://github.com/Newscatcher/newscatcher-typescript). If you use our legacy TypeScript SDK (Konfig-based), see our [Legacy SDKs](/v3/api-reference/libraries/legacy) documentation. We recommend migrating to this newer SDK for improved features and ongoing support. ## Resources * [GitHub repository](https://github.com/Newscatcher/newscatcher-typescript) * [npm package](https://www.npmjs.com/package/newscatcher-sdk) # Authentication Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/authentication Learn how to authenticate your API request To use News API v3, you must authenticate your requests using an API key. This guide explains how to include your API key in your requests. ## API key Your API key is a unique identifier that authenticates your requests to the NewsCatcher News API. Include this key in the `x-api-token` HTTP header for each request you make to the API. ## How to authenticate To authenticate your requests follow this steps: 1. Obtain your API key from your account manager. 2. Include your API key in the `x-api-token` header of the request. Here are the examples of how to use the API key in your requests for differnt programming languages: ```bash cURL curl -X GET "https://v3-api.newscatcherapi.com/api/search?q=artificial%20intelligence" \ -H "x-api-token: YOUR_API_KEY_HERE" ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "artificial intelligence" } try: response = requests.get(URL, headers=HEADERS, params=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` ```javascript Node.js const axios = require("axios"); const API_KEY = "YOUR_API_KEY_HERE"; const URL = "https://v3-api.newscatcherapi.com/api/search"; const params = { q: "artificial intelligence", }; axios .get(URL, { headers: { "x-api-token": API_KEY }, params: params, }) .then((response) => { console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Failed to fetch articles: ${error.message}`); }); ``` ```typescript TypeScript import axios, { AxiosResponse } from "axios"; const API_KEY: string = "YOUR_API_KEY_HERE"; const URL: string = "https://v3-api.newscatcherapi.com/api/search"; const params = { q: "artificial intelligence", }; axios .get(URL, { headers: { "x-api-token": API_KEY }, params: params, }) .then((response: AxiosResponse) => { console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Failed to fetch articles: ${error.message}`); }); ``` ```go Go package main import ( "fmt" "io/ioutil" "log" "net/http" ) func main() { apiKey := "YOUR_API_KEY_HERE" url := "https://v3-api.newscatcherapi.com/api/search?q=artificial%20intelligence" req, err := http.NewRequest("GET", url, nil) if err != nil { log.Fatalf("Failed to create request: %v", err) } req.Header.Set("x-api-token", apiKey) client := &http.Client{} resp, err := client.Do(req) if err != nil { log.Fatalf("Failed to fetch articles: %v", err) } defer resp.Body.Close() body, err := ioutil.ReadAll(resp.Body) if err != nil { log.Fatalf("Failed to read response body: %v", err) } fmt.Println(string(body)) } ``` ```php PHP ``` ```ruby Ruby require 'net/http' require 'uri' require 'json' api_key = 'YOUR_API_KEY_HERE' url = URI('https://v3-api.newscatcherapi.com/api/search?q=artificial%20intelligence') http = Net::HTTP.new(url.host, url.port) http.use_ssl = true request = Net::HTTP::Get.new(url) request['x-api-token'] = api_key begin response = http.request(request) puts JSON.pretty_generate(JSON.parse(response.body)) rescue StandardError => e puts "Failed to fetch articles: #{e.message}" end ``` Replace `YOUR_API_KEY_HERE` with your actual API key in these examples. ## Security best practices Don't share your API key publicly or include it in client-side code. Store your API key in environment variables or secure key management systems. Rotate your API key periodically for enhanced security. If you suspect your API key has been compromised, contact our support team immediately to have it revoked and replaced. Remember, your API key is tied to your specific plan and usage limits. Protect it to prevent unauthorized use and potential overage charges. # Enumerated parameters Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/enumerated-parameters Allowed values for API parameters with a fixed set of options ## Language (`lang` and `not_lang`) The `lang` and `not_lang` parameters accept two-letter ISO 639-1 language codes. You can use these parameters to include or exclude articles in specific languages. We distinguish Chinese (China) and Chinese (Taiwan) languages, using `cn` and `tw`, respectively. This is the only difference between our codes and the ISO 639-1 standard. ### Commonly used language codes | Code | Language | | ---- | ---------------- | | ar | Arabic | | cn | Chinese (China) | | de | German | | en | English | | es | Spanish | | fr | French | | it | Italian | | ja | Japanese | | pt | Portuguese | | tw | Chinese (Taiwan) | | uk | Ukrainian | ### Supported language codes ``` af,ar,bg,bn,ca,cs,cy,cn,da,de,el,en,es,et,fa,fi,fr,gu,he,hi,hr,hu,id,it,ja,kn,ko,lt,lv,mk,ml,mr,ne,nl,no,pa,pl,pt,ro,ru,sk,sl,so,sq,sv,sw,ta,te,th,tl,tr,tw,uk,ur,vi ``` For a description of supported language codes, see the [ISO 639-1 code table](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). ### Usage examples GET requests: * To search for articles in English: `lang=en` * To search for articles in English and Spanish: `lang=en,es` * To exclude articles in French: `not_lang=fr` POST requests: * To search for articles in English and Spanish: ```json { "lang": ["en", "es"] } ``` * To exclude articles in French and German: ```json { "not_lang": ["fr", "de"] } ``` ## Country (`country` and `not_country`) The `country` and `not_country` parameters accept two-letter ISO 3166-1 alpha-2 country codes. You can use these parameters to include or exclude articles from specific countries. ### Commonly used country codes | Code | Country | | ---- | -------------- | | AU | Australia | | BR | Brazil | | CA | Canada | | DE | Germany | | FR | France | | GB | United Kingdom | | IN | India | | JP | Japan | | UA | Ukraine | | US | United States | For a complete list of supported country codes, refer to the [ISO 3166-1 alpha-2 code table](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Officially_assigned_code_elements). ### Usage examples GET requests: * To search for articles from the United States: `country=US` * To search for articles from the United States and United Kingdom: `country=US,GB` * To exclude articles from Canada: `not_country=CA` POST requests: * To search for articles from the United States and United Kingdom: ```json { "country": ["US", "GB"] } ``` * To exclude articles from Canada and Australia: ```json { "not_country": ["CA", "AU"] } ``` ## News Type (`news_type`) The `news_type` parameter filters results based on the type of news source. It accepts one or more of the following values: * Tech News and Updates * Sports News and Blogs * News and Blogs * E-commerce and Product Information * Educational News * Press Releases * Corporate News * Gaming News and Blogs * Entertainment and Media News * Health and Medical News * Government and Municipal News * Real Estate News * Automotive News and Blogs * News Aggregators * Fashion and Lifestyle * Local News and Community Events * Music and Radio * Reviews * Blogs and Magazines * Political News * Non-Profit and Organization News * Event News * General News Outlets * Gambling News * Travel and Lifestyle * Finance and Investment * Specific News Type * Pure News Outlet * Corporate News Section * Other ### Usage examples GET requests: * To search for tech news: `news_type=Tech News and Updates` * To search for both general news and tech news: `news_type=General News Outlets,Tech News and Updates` POST requests: * To search for tech news and sports news: ```json { "news_type": ["Tech News and Updates", "Sports News and Blogs"] } ``` ## Additional notes * For GET requests, when specifying multiple values for a parameter, use a comma-separated string. * For POST requests, you can specify multiple values using a comma-separated string or an array of strings. * Parameter values are case-sensitive. Always use the exact spelling and capitalization as listed in this document. * For parameters not listed in this document, refer to the specific endpoint documentation in the API Reference. * The maximum number of values that can be specified for each parameter may be limited. Refer to the API documentation for specific limits. # Errors Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/errors Quick reference guide for common errors in News API v3 NewsAPI v3 uses standard HTTP codes to indicate request success or failure. 2xx codes mean success, 4xx codes indicate user-related failures, and 5xx codes indicate infrastructure issues. ## Common error codes The following table provides a quick reference for the most common errors you might encounter while using the News API v3: | Status code | Status | Description | Quick solution | | ----------- | --------------------- | ------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | 401 | Unauthorized | Authentication failed. Typically due to an invalid or missing API key. | Verify the API key is correct and included in the `x-api-token` header. Check your API key status with `/subscription`. | | 403 | Forbidden | Request is valid, but the server refuses action. Could be due to permission issues or plan limitations. | Check your plan permissions and parameter usage. Ensure date ranges are within allowed limits. Contact support if needed. | | 408 | Request timeout | The server did not receive a complete request message within the default timeout of 30 seconds. | Check network connection, reduce request payload size if possible, narrow search query, implement retry logic. | | 422 | Validation error | Server understands the request but cannot process it due to invalid input. | Ensure request data is correctly formatted and includes all required fields. Check date formats and parameter values. | | 429 | Too many requests | Exceeded the allowed rate limit for API requests. | Implement request throttling and retry with exponential backoff. Consider upgrading your plan for higher limits. | | 499 | Unknown status code | Client-side errors that do not fit standard HTTP status codes. | Check for missing fields or incorrect parameters. Follow the [API Reference](/v3/api-reference/endpoints/search/search-articles-get). | | 500 | Internal server error | Unexpected server-side issue. Could be due to a malformed payload or temporary server issues. | Retry after a few minutes. Check the [status page](https://status.newscatcherapi.com) for known issues. Validate payload. | ### Quick tips * Always log errors for easy debugging. * Use exponential backoff for retries on rate-limited errors (`429`). * Validate request data against API documentation to avoid validation errors (`422`). For detailed troubleshooting steps and best practices, refer to the [Error handling](/v3/documentation/troubleshooting/error-handling) guide. # HTTP headers Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/http-headers Understanding request and response headers in the NewsCatcher APIs This guide explains the HTTP headers used in the NewsCatcher APIs, including required request headers, useful response headers, and best practices for working with them. ## Request headers These are the headers you should include when making requests: | Header | Required | Description | Example | | -------------- | -------- | --------------------------------------------------------------- | -------------------------------- | | `x-api-token` | Yes | Your API key for authentication | `x-api-token: abcd1234...` | | `Content-Type` | Yes\* | Content type of the request body (\*required for POST requests) | `Content-Type: application/json` | | `Accept` | No | Preferred response format | `Accept: application/json` | ### Authentication header The `x-api-token` header is required for all API requests and contains your API key: ```http POST /api/search HTTP/1.1 Host: v3-api.newscatcherapi.com x-api-token: YOUR_API_KEY ``` Never share your API key publicly or commit it to source control. Consider using environment variables or a secure secrets manager to store it. ### Content-Type header For POST requests, always set the `Content-Type` header to `application/json`: ```http POST /api/search HTTP/1.1 Host: v3-api.newscatcherapi.com x-api-token: YOUR_API_KEY Content-Type: application/json Accept: application/json { "q": "bitcoin", "from_": "30d" } ``` ## Response headers These are the headers returned in API responses that provide useful information: | Header | Description | Example | | ------------------- | ------------------------------------------------------------ | ------------------------------------------------------ | | `Date` | When the response was generated | `Date: Sat, 22 Mar 2025 13:49:07 GMT` | | `Content-Type` | Format of the response body | `Content-Type: application/json` | | `Transfer-Encoding` | How the response is encoded for transfer | `Transfer-Encoding: chunked` | | `Connection` | Connection status between client and server | `Connection: keep-alive` | | `x-process-time` | Time taken to process the request (in seconds) | `x-process-time: 0.7334954738616943` | | `correlation-id` | Unique identifier for tracing the request through our system | `correlation-id: a702576c-2007-4b23-9ba4-cad305c84275` | | `cf-cache-status` | Cloudflare cache status | `cf-cache-status: DYNAMIC` | | `Server` | Server software handling the request | `Server: cloudflare` | | `CF-RAY` | Cloudflare ray ID for request tracing | `CF-RAY: 924626834b0fbfb4-WAW` | | `Content-Encoding` | Compression method used for the response body | `Content-Encoding: br` | ### Correlation ID header The `correlation-id` header contains a unique identifier for your request: ```http correlation-id: a702576c-2007-4b23-9ba4-cad305c84275 ``` Always include this ID when contacting support about an issue with an API request. It helps us quickly locate your specific request in our logs. ### Process time header The `x-process-time` header shows how long it took our system to process your request (in seconds): ```http x-process-time: 0.7334954738616943 ``` This can be useful for performance monitoring and optimization. If you consistently see high process times, consider optimizing your queries or implementing caching strategies. ## Code examples Here are the examples of how to work with headers in your requests for differnt programming languages: ```bash cURL curl -X POST "https://v3-api.newscatcherapi.com/api/search" \ -H "x-api-token: YOUR_API_KEY_HERE" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{"q": "bitcoin", "from_": "30d"}' \ -i # The -i flag displays response headers in the output # Look for the correlation-id in the response headers ``` ```python Python import requests import json url = "https://v3-api.newscatcherapi.com/api/search" headers = { "x-api-token": "YOUR_API_KEY_HERE", "Content-Type": "application/json", "Accept": "application/json" } payload = { "q": "bitcoin", "from_": "30d" } try: response = requests.post(url, headers=headers, json=payload) response.raise_for_status() # Access response headers correlation_id = response.headers.get('correlation-id') process_time = response.headers.get('x-process-time') print(f"Correlation ID: {correlation_id}") print(f"Process Time: {process_time}") print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Request failed: {e}") ``` ```javascript Node.js const axios = require("axios"); const url = "https://v3-api.newscatcherapi.com/api/search"; const headers = { "x-api-token": "YOUR_API_KEY_HERE", "Content-Type": "application/json", Accept: "application/json", }; const payload = { q: "bitcoin", from_: "30d", }; axios .post(url, payload, { headers }) .then((response) => { // Access response headers const correlationId = response.headers["correlation-id"]; const processTime = response.headers["x-process-time"]; console.log(`Correlation ID: ${correlationId}`); console.log(`Process Time: ${processTime}`); console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Request failed: ${error.message}`); }); ``` ```typescript TypeScript import axios, { AxiosResponse, AxiosError } from "axios"; const url: string = "https://v3-api.newscatcherapi.com/api/search"; const headers: Record = { "x-api-token": "YOUR_API_KEY_HERE", "Content-Type": "application/json", Accept: "application/json", }; const payload = { q: "bitcoin", from_: "30d", }; axios .post(url, payload, { headers }) .then((response: AxiosResponse) => { // Access response headers const correlationId: string = response.headers["correlation-id"]; const processTime: string = response.headers["x-process-time"]; console.log(`Correlation ID: ${correlationId}`); console.log(`Process Time: ${processTime}`); console.log(JSON.stringify(response.data, null, 2)); }) .catch((error: AxiosError) => { console.error(`Request failed: ${error.message}`); }); ``` ```go Go package main import ( "bytes" "encoding/json" "fmt" "io/ioutil" "log" "net/http" ) func main() { url := "https://v3-api.newscatcherapi.com/api/search" apiKey := "YOUR_API_KEY_HERE" // Create request payload payload := map[string]string{ "q": "bitcoin", "from_": "30d", } payloadBytes, err := json.Marshal(payload) if err != nil { log.Fatalf("Failed to marshal JSON: %v", err) } // Create request req, err := http.NewRequest("POST", url, bytes.NewBuffer(payloadBytes)) if err != nil { log.Fatalf("Failed to create request: %v", err) } // Add headers req.Header.Set("x-api-token", apiKey) req.Header.Set("Content-Type", "application/json") req.Header.Set("Accept", "application/json") // Send request client := &http.Client{} resp, err := client.Do(req) if err != nil { log.Fatalf("Request failed: %v", err) } defer resp.Body.Close() // Access response headers correlationID := resp.Header.Get("correlation-id") processTime := resp.Header.Get("x-process-time") fmt.Printf("Correlation ID: %s\n", correlationID) fmt.Printf("Process Time: %s\n", processTime) // Read and print response body body, err := ioutil.ReadAll(resp.Body) if err != nil { log.Fatalf("Failed to read response body: %v", err) } var prettyJSON bytes.Buffer if err := json.Indent(&prettyJSON, body, "", " "); err != nil { log.Fatalf("Failed to format JSON: %v", err) } fmt.Println(prettyJSON.String()) } ``` ```php PHP "bitcoin", "from_" => "30d" ]); $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $payload); curl_setopt($ch, CURLOPT_HEADER, true); // Include headers in the response curl_setopt($ch, CURLOPT_HTTPHEADER, [ "x-api-token: $apiKey", "Content-Type: application/json", "Accept: application/json" ]); $response = curl_exec($ch); if ($response === false) { echo "Request failed: " . curl_error($ch); } else { // Split headers and body $headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE); $headerStr = substr($response, 0, $headerSize); $body = substr($response, $headerSize); // Parse headers $headers = []; $headerLines = explode("\n", $headerStr); foreach ($headerLines as $line) { $parts = explode(':', $line, 2); if (isset($parts[1])) { $headers[trim($parts[0])] = trim($parts[1]); } } // Access specific headers $correlationId = isset($headers['correlation-id']) ? $headers['correlation-id'] : 'Not found'; $processTime = isset($headers['x-process-time']) ? $headers['x-process-time'] : 'Not found'; echo "Correlation ID: $correlationId\n"; echo "Process Time: $processTime\n"; echo json_encode(json_decode($body), JSON_PRETTY_PRINT); } curl_close($ch); ?> ``` ```ruby Ruby require 'net/http' require 'uri' require 'json' url = URI.parse('https://v3-api.newscatcherapi.com/api/search') api_key = 'YOUR_API_KEY_HERE' payload = { q: 'bitcoin', from_: '30d' } http = Net::HTTP.new(url.host, url.port) http.use_ssl = (url.scheme == 'https') request = Net::HTTP::Post.new(url.path) request['x-api-token'] = api_key request['Content-Type'] = 'application/json' request['Accept'] = 'application/json' request.body = payload.to_json begin response = http.request(request) # Access response headers correlation_id = response['correlation-id'] process_time = response['x-process-time'] puts "Correlation ID: #{correlation_id}" puts "Process Time: #{process_time}" puts JSON.pretty_generate(JSON.parse(response.body)) rescue StandardError => e puts "Request failed: #{e.message}" end ``` ## Best practices 1. **Always use the correct authentication header**: Use `x-api-token` instead of `x-api-key` for authentication. 2. **Always log correlation IDs**: Store correlation IDs alongside your application logs for easier troubleshooting. 3. **Monitor process times**: Keep track of `x-process-time` values to identify performance trends or issues. 4. **Handle headers consistently**: Implement consistent header handling in your error handling and logging code. 5. **Be aware of case sensitivity**: While HTTP headers are case-insensitive in the protocol, many libraries preserve the original casing when accessing them. 6. **Add proper Content-Type headers**: Always include the `Content-Type: application/json` header for POST requests. ## Related resources * [Request tracing with correlation IDs](/v3/documentation/troubleshooting/request-tracing-correlation-ids) * [Error handling](/v3/documentation/troubleshooting/error-handling) * [Authentication](/v3/api-reference/overview/authentication) # Welcome to News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/introduction News API v3 features and capabilities # Introduction News API v3 provides a flexible way to access a comprehensive database of news articles from around the world. With v3, you get more control over your queries and richer data in your responses. Whether you're building news apps, training language models, or analyzing media trends, News API v3 helps you work efficiently with large-scale news data. ## Key features * Advanced querying with boolean operators and proximity search * NLP-enriched content * Articles clustering and deduplication * Multi-language support * High volume data retrieval (up to 1000 articles per request) ## Base URL All API requests must be sent to the following base URL: ```bash https://v3-api.newscatcherapi.com/api ``` ## Endpoints NewsCatcher News API v3 provides the following key endpoints: 1. `/search`: Find relevant articles using keywords, publish date, language, country, and more. 2. `/latest_headlines`: Retrieve news articles for a specific period, filtered by country, language, topic, or sources. 3. `/breaking_news`: Retrieves breaking news articles and sorts them based on specified criteria. 4. `/authors`: Locate articles written by specific authors or journalists. 5. `/search_by_link`: Find articles that mention specific URLs or domains. 6. `/search_similar`: Discover articles similar to a given article. 7. `/sources`: Check available media outlets with the source language, country, name, and URL. 8. `/aggregation_count`: Get aggregated article counts based on language, country, source, publish date, and more. 9. `/subscription`: Access details about your API subscription, including plan limits and usage statistics. All endpoints support `GET` and `POST` methods for flexible integration into your applications. ## Request format Please include your API key in the `x-api-token` header for each request. To ensure security, all API requests must be in HTTPS format. ### Example request ```curl cURL curl -X GET "https://v3-api.newscatcherapi.com/api/search?q=artificial%20intelligence" \ -H "x-api-token: YOUR_API_KEY_HERE" ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "artificial intelligence" } try: response = requests.get(URL, headers=HEADERS, params=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` ## Response format All API responses are in JSON format and typically include the following fields: The status of the request. The total number of articles matching the query. The current page number. The total number of available pages. The number of articles per page. An array of article objects (when applicable), each with article content, title, links, and other metadata. Depending on the subscription plan, it can include an NLP layer with sentiments, topics, tags, named entities, and more. The title of the article. The primary author of the article. A list of authors or contributors associated with the article. A list of journalists who wrote the article, if applicable. The date and time when the article was published, in ISO 8601 format. The precision of the published date (e.g., "full", "date"). The date and time when the article was last updated, in ISO 8601 format. The precision of the updated date (e.g., "full", "date"). The direct URL link to the article. The domain of the source where the article is published. The full domain URL of the source where the article is published. The name of the source where the article is published. Indicates if the article is a headline. Indicates if the article is paid content. The parent URL of the article's domain. The country code of the article's source. The rights information associated with the article. The SEO rank of the article's source. The URL of the media (e.g., image) associated with the article. The language code of the article. A brief description or summary of the article. The full content of the article. The total number of words in the article. Indicates if the article is an opinion piece. The Twitter account associated with the article's source or author. A list of all hyperlinks found in the article content. A list of all domains linked in the article content. A unique identifier for the article. A relevance score assigned to the article. An object reflecting the parameters and filters you used in your request, which helps confirm how the API processed your input and debug any issues if the results are not as expected. ### Example response ```json JSON { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100, "articles": [ { "title": "Artificial Intelligence (AI) Consulting Market Size, Regional Status and Outlook 2024–2032", "author": "Precision Reports", "authors": ["Precision Reports", "Information Technology"], "journalists": [], "published_date": "2024-09-05 03:39:18", "published_date_precision": "full", "updated_date": "2024-09-05 03:39:26", "updated_date_precision": "full", "link": "https://medium.com/@precisionreports_jaguar01/artificial-intelligence-ai-consulting-market-size-regional-status-and-outlook-2024-2032-bfba1d710ec7", "domain_url": "medium.com", "full_domain_url": "medium.com", "name_source": "Medium", "is_headline": false, "paid_content": false, "parent_url": "https://medium.com", "country": "US", "rights": "medium.com", "rank": 58, "media": "https://miro.medium.com/v2/resize:fit:1200/1*6qh18sq9x_XkoCBOTWZvOw.png", "language": "en", "description": "Artificial Intelligence (AI) Consulting Market size was valued at USD 7065.95 Million in 2023 and is expected to reach USD 57862.17 Million", "content": "Artificial intelligence consulting is where AI engineers and experts assist business organizations from various industries to adopt AI applications to achieve their goals.\nGlobal Artificial Intelligence (AI) Consulting Market size was valued at USD 7065.95 Million in 2023 and is expected to reach USD 57862.17 Million in 2032, growing at a CAGR of 26.32% from 2023 to 2032...", "word_count": 1387, "is_opinion": false, "twitter_account": "@Medium", "all_links": [ "https://www.precisionreports.co/purchase/25803898", "https://www.precisionreports.co/enquiry/request-sample/25803898?utm_source=Medium_Jaguar", "https://speechify.com/medium?source=post_page-----bfba1d710ec7--------------------------------" ], "all_domain_links": [ "precisionreports.co", "statuspage.io", "speechify.com" ], "id": "7d96f27e4f0b407b034a5b41d2890792", "score": 19.85728 } // ... other articles ], "user_input": { "q": "artificial intelligence", "search_in": ["title_content"], "from_": "2024-08-29T00:00:00", "to_": "2024-09-05T10:55:55.313383", "page": 1, "page_size": 100 // ... other inputs } } ``` ## Getting started To start using News API v3: 1. [Contact our sales team](https://www.newscatcherapi.com/pricing) to discuss your enterprise needs and obtain an API key. 2. Once you have your API key, check out our [Quickstart guide](/v3/documentation/get-started/quickstart) to make your first API call. 3. Explore our [Documentation](/v3/documentation/get-started/overview) to unlock the full potential of the API and integrate it into your systems. For detailed information on request parameters and response formats, refer to the specific endpoint documentation in this API reference. Remember, News API v3 is more than just a tool - it's your partner in navigating the complex world of news data. Whether you're building the next big thing in tech or seeking deep insights from global media, we're here to empower your journey. # Rate limits and quotas Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/api-reference/overview/rate-limits Rate limits and quota management for News API v3 usage News API v3 implements rate limits and quotas to ensure fair usage and maintain service quality for all users. This guide explains how these limits work and how to monitor your usage. ## Understanding your limits Your subscription plan defines the following usage limits: * Concurrent calls: The number of simultaneous API requests you can make. * Plan calls: The total number of API calls allowed per month. * Historical days: The number of days in the past you can retrieve articles. ## Checking your subscription You can view your current subscription details and remaining quota by making a request to the `/subscription` endpoint. ### Example request ```bash cURL curl -X GET "https://v3-api.newscatcherapi.com/api/subscription" \ -H "x-api-token: YOUR_API_KEY_HERE" ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/subscription" HEADERS = {"x-api-token": API_KEY} try: response = requests.get(URL, headers=HEADERS) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch subscription details: {e}") ``` Replace `YOUR_API_KEY_HERE` with your actual API key. Use cURL, Postman, or any other tool to make HTTP requests. ### Example response The response will look similar to this: ```json { "active": true, "concurrent_calls": 10, "plan": "v3_nlp_iptc_tags", "plan_calls": 250000, "remaining_calls": 99986, "historical_days": 400 } ``` ### Response format Indicates if your subscription is currently active. The maximum number of simultaneous API requests you can make. Your subscription plan name. The total number of API calls allowed in your subscription period. The number of API calls you have left in your current period. The number of days in the past you can retrieve articles for. ## Rate limit behavior If you exceed your rate limit for concurrent calls, you receive a `429` (Too many requests) error. When this happens, wait a short time before retrying your request. In production, consider implementing robust error handling and retry mechanisms, such as exponential backoff or adaptive throttling, to dynamically adjust the frequency of requests based on the server response. This approach helps prevent hitting rate limits repeatedly and ensures smoother interactions with the API, improving both reliability and user experience. ## Quota reset Your quota of total calls (`plan_calls`) typically resets monthly at the beginning of each billing cycle. The exact reset time depends on your specific subscription terms. Any remaining calls from the previous month do not carry over to the next month. ## Best practices To make the most of your subscription and avoid hitting limits: 1. Monitor your usage regularly using the `/subscription` endpoint. 2. Implement efficient error handling in your code, especially for `429` errors. 3. Optimize your queries to retrieve only the data you need. Use filters and parameters effectively to reduce unnecessary API calls. 4. For time-sensitive applications, consider implementing a polling strategy with appropriate intervals rather than constant real-time requests. 5. If you're building a user-facing application, consider implementing pagination or "load more" functionality to fetch additional results only when needed. 6. If you consistently reach your limits, consider upgrading your plan to better suit your needs. Remember, news data is highly dynamic, with millions of new articles added daily. Avoid caching article data for extended periods, as this could lead to outdated information. Instead, focus on efficient real-time data retrieval strategies. If you have any questions about your rate limits or need to increase your quota, [contact our support team](https://support-sign-in.newscatcherapi.com/). # Libraries Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/get-started/libraries NewsCatcher News API v3 SDKs # NewsCatcher API client libraries NewsCatcher provides official SDKs for multiple programming languages to help you integrate with our API quickly and efficiently. These SDKs are generated using Fern and are available as open-source projects on GitHub. ## Available SDKs A comprehensive SDK for developing Python applications. A robust SDK for TypeScript development, with full type support. A powerful SDK for building Java applications and services. An efficient SDK for developing applications in Go. A feature-rich SDK for C# and .NET development. Documentation for deprecated SDKs (Konfig-generated) ## SDK documentation Each SDK has its own documentation, which includes: * Installation instructions * Authentication setup * Basic and advanced usage examples * Error handling techniques * Advanced configuration options Visit the GitHub repository for each SDK to view the comprehensive documentation. ## Common features All our SDKs provide: * **Authentication**: Simple API key authentication * **Error handling**: Structured error types and handling mechanisms * **Retry logic**: Automatic retries with exponential backoff * **Timeout configuration**: Customizable request timeouts * **Type safety**: Strong typing for request parameters and responses ## Getting started 1. Choose the SDK for your preferred programming language. 2. Install the SDK via the appropriate package manager. 3. Create a client instance with your API key. 4. Start making API requests. For more information on the API endpoints and parameters, refer to the [API reference](/v3/api-reference/endpoints). # News API v3 subscription plans Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/get-started/news-api-v3-subscription-plans Select a News API v3 plan that matches your needs - from basic news monitoring to advanced AI-powered content analysis. Each plan includes support for multiple languages, advanced querying capabilities, and comprehensive metadata. Plans differ in their specialized features and processing capabilities. ## Available plans **Plan ID**: `v3_basic` Entry-level plan with access to all core functionality: Access all API endpoints with comprehensive metadata Retrieve up to 1,000 articles per request Categorize content across 17 themes (Business, Tech, Politics, and more) Filter by article type (headline, opinion, paid content) Use URL and domain link filtering Access predefined top sources by country Filter by news domain type and source metadata Access historical data since July 2023 **Plan ID**: `v3_nlp` Standard NLP plan adds comprehensive content analysis capabilities: Access all Basic plan features Detect named entities (PER, ORG, LOC, MISC) Analyze sentiment in titles and content on a `-1.0` to `1.0` scale Get AI-generated article summaries Enable article clustering with customizable thresholds Use content deduplication **Plan ID**: `v3_nlp_iptc_tags` IPTC Tags plan adds standardized content categorization: Access all NLP plan features Use IPTC media topic tags for standardized news categorization Access IAB content categories for digital advertising Enable hierarchical topic structures Implement industry-standard content taxonomy Enhanced content targeting capabilities **Plan ID**: `v3_nlp_embeddings` Embeddings plan adds vector embeddings for advanced analysis: Access all IPTC Tags plan features Get 1024-dimensional vector embeddings via `nlp.new_embeddings` Use multilingual-e5-large model for embeddings Enable semantic similarity comparison Build advanced content clustering Implement recommendation systems ## Technical specifications | Feature | Basic | NLP | IPTC Tags | Embeddings | | ---------------------- | ----- | --- | --------- | ---------- | | **Core Features** | | | | | | All API Endpoints | ✓ | ✓ | ✓ | ✓ | | 1000 Articles/Request | ✓ | ✓ | ✓ | ✓ | | Source Classification | ✓ | ✓ | ✓ | ✓ | | URL Filtering | ✓ | ✓ | ✓ | ✓ | | Historical Data Access | ✓ | ✓ | ✓ | ✓ | | **NLP Features** | | | | | | Theme Classification | ✓ | ✓ | ✓ | ✓ | | Entity Recognition | - | ✓ | ✓ | ✓ | | Sentiment Analysis | - | ✓ | ✓ | ✓ | | Article Clustering | - | ✓ | ✓ | ✓ | | Content Deduplication | - | ✓ | ✓ | ✓ | | **Advanced Features** | | | | | | IPTC Media Topics | - | - | ✓ | ✓ | | IAB Categories | - | - | ✓ | ✓ | | Vector Embeddings | - | - | - | ✓ | ## Custom solutions Beyond our standard plans, we offer specialized enterprise solutions: * **Custom Tags**: Implement organization-specific taxonomy for content classification. * **Entity Resolution**: Cut through multiple mentions of similarly-named companies and individuals by resolving unique identifiers and providing precise entity matching. * **Events Intelligence**: Track, deduplicate, and analyze events across multiple sources while extracting standardized information about participants, locations, and timelines. * **Insights Engine**: Identify market opportunities through tracking product releases and industry trends. To ensure uninterrupted service, implement appropriate rate limiting in your application based on your plan's specifications. ## Next steps 1. Review feature documentation: * [NLP features](/v3/documentation/guides-and-concepts/nlp-features) * [Clustering](/v3/documentation/guides-and-concepts/clustering-news-articles) * [Custom tagging](/v3/documentation/guides-and-concepts/custom-tagging) 2. Explore the [API Reference](/v3/api-reference) 3. Visit the [pricing page](https://www.newscatcherapi.com/pricing) to start your subscription ## Support Need help selecting a plan? Contact our team: * For pricing and sales: [sales@newscatcherapi.com](mailto:sales@newscatcherapi.com) * For technical questions: [support@newscatcherapi.com](mailto:support@newscatcherapi.com) All plans include a free trial period. Contact our sales team to discuss your specific needs and try out the features. # News API v3 overview Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/get-started/overview Discover NewsCatcher News API v3, an enterprise-level solution for large-scale news data retrieval and analysis. In today's fast-paced digital world, staying on top of global news can feel like drinking from a firehose. Whether you're analyzing market trends or developing the next innovative news application, you need a strong and trustworthy method to access and interpret the extensive news data available. This is where the NewsCatcher News API v3 comes into play. ## About News API v3 The NewsCatcher News API v3 is an enterprise-level tool for retrieving and analyzing news data. It's a robust, flexible solution tailored for large businesses, data science teams, and content curation departments that need to effortlessly navigate the complex world of global news at scale. Think of it as a highly intelligent news librarian that can: * Search through millions of articles in seconds * Understand the context and sentiment of news pieces * Group similar stories together * Deliver clean, organized data ready for your analysis or application But it's more than just a search engine. News API v3 is a comprehensive solution that brings the power of advanced natural language processing right to your enterprise applications. ## Why enterprises choose News API v3? Imagine you're building a financial analysis platform for institutional investors. Your users need real-time updates on company news, but they're drowning in irrelevant information. With the NewsCatcher News API v3, your team can: * Instantly access news from thousands of sources worldwide. * Filter results with precision using advanced search capabilities. * Quickly analyze news data with provided NLP information like article topic, summary, sentiment, tags, and named entities. * Identify trends through automated article clustering. * Scale your application effortlessly with high-volume data retrieval. The NewsCatcher News API v3 is designed for large-scale enterprise needs, providing the foundation for innovative news-based applications and in-depth media analysis at a corporate level. ## How does it work? At its core, the NewsCatcher News API v3 works by: 1. **Collecting**: Continuously gathering news from over 70,000 web sources globally, including major outlets and small local publishers. 2. **Processing**: Applying advanced NLP techniques to enrich the content with metadata. 3. **Indexing**: Organizing the processed data for lightning-fast retrieval. 4. **Serving**: Providing the data through a set of powerful, easy-to-use endpoints. Here's a quick look at the key features of v3 API: * **Search capabilities**: Use Boolean operators, proximity search, and wildcards to find exactly what you need. * **NLP enrichment**: Each article comes with pre-processed NLP data, including: * Article summary * Sentiment analysis scores for title and content * Named entity recognition (PER, ORG, MISC, LOC) * Theme categorization * IPTC and IAB content tags * **Data management**: Benefit from automatic clustering and deduplication for clean, relevant results. * **Flexible endpoints**: Access data through various endpoints tailored to different use cases, from broad searches to author-specific queries. ## Use cases The NewsCatcher News API v3 shines in various scenarios: * **News aggregation**: Build the next big news app that curates content from thousands of sources. * **Market intelligence**: Keep your finger on the pulse of industry trends and competitor movements. * **Content marketing**: Automate content curation for the company blog or social media channels. * **Crisis monitoring**: Track real-time news coverage during critical events. * **Academic research**: Conduct large-scale studies on media coverage and public discourse. ## How is it different? While there are other news APIs out there, the NewsCatcher News API v3 stands out in the enterprise space: | Feature | NewsCatcher News API v3 | Typical Enterprise News APIs | | ------------------------- | --------------------------- | ---------------------------- | | Global Coverage | Over 70,000 News Sources | Often Limited | | NLP Data Enrichment | Comprehensive | Basic or None | | Search Flexibility | High | Moderate | | Data Retrieval Efficiency | Up to 1000 articles/request | Usually less | | Enterprise Customization | Extensive | Limited | ## Ready to get started? Getting started with the NewsCatcher News API v3 for your organization involves a two-step process: ### For decision makers 1. [Contact our sales team](https://www.newscatcherapi.com/pricing) to discuss your enterprise needs and obtain an API key to access News API v3. 2. Our team will work with you to set up a custom plan that fits your organization's scale and requirements. 3. Once the contract is finalized, we'll initiate the onboarding process and provide full support to ensure your team can effectively leverage the API. ### For development teams Once you have received your API key: 1. Check out our [Quickstart guide](/v3/documentation/get-started/quickstart) to make your first API call and understand the basics. 2. Explore our [Documentation](/v3/documentation) to unlock the full potential of the API and integrate it into your systems. 3. Refer to our [API Reference](/v3/api-reference) for detailed information on endpoints, request parameters, and response formats. Remember, the NewsCatcher News API v3 is more than just a tool - it's your partner in navigating the complex world of news data. Whether you're building the next big thing in tech or seeking deep insights from global media, we're here to empower your journey. Have questions or need a custom solution? Our team is always here to help you make the most of the NewsCatcher News API v3. # News API v3 quickstart guide Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/get-started/quickstart This guide will help you make your first API call to NewsCatcher News API v3 and start retrieving news data in just a few minutes. ## Before you start Before you begin, make sure you meet these prerequisites: * An API key for NewsCatcher News API v3 (obtained from your account manager) * Python 3.6+ installed on your system * The `requests` library for Python ## Get started ### Step 1 - Set up your environment First, make sure you have Python and the requests library installed. You can install requests using pip: ```bash pip install requests ``` ### Step 2 - Create your first script Create a new file named `newscatcher_quickstart.py` and add the following code: ```python import requests import json # Configuration API_KEY = "YOUR_API_KEY_HERE" # Replace with your actual API key URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "renewable energy", "lang": "en", "page_size": 1 } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Check if the request was successful # Print the raw JSON response print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` Remember to replace `YOUR_API_KEY_HERE` with your actual API key. ### Step 3 - Run the script and review the results Run the script from your terminal: ```bash python newscatcher_quickstart.py ``` You should see a JSON response similar to this (shortened for readability): ```json { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 10000, "page_size": 1, "articles": [ { "title": "Energix Renewables and Google Sign a Strategic Renewable Energy Agreement", "author": "PR Newswire", "authors": ["PR Newswire"], "journalists": [], "published_date": "2024-08-20 16:28:00", "published_date_precision": "full", "updated_date": "2024-08-20 16:28:00", "updated_date_precision": "full", "link": "https://finance.yahoo.com/news/energix-renewables-google-sign-strategic-162800318.html", "domain_url": "yahoo.com", "full_domain_url": "finance.yahoo.com", "name_source": "Yahoo", "is_headline": false, "paid_content": false, "parent_url": "https://finance.yahoo.com/calendar", "country": "US", "rights": "yahoo.com", "rank": 37, "media": "https://media.zenfs.com/en/prnewswire.com/e28cbff44206d28b1afd175f5527ddff", "language": "en", "description": "Energix Renewables, a leader in the U.S. renewable energy sector and part of the Energix Group, a global leader in renewable energy, is proud to announce the signing of a strategic long-term agreement…", "content": "Pioneering Agreement to Transform the Renewable Energy Landscape in the U.S.\nARLINGTON, Va., Aug. 20, 2024 /PRNewswire/ -- Energix Renewables, a leader in the U.S. renewable energy sector and part of the Energix Group, a global leader in renewable energy, is proud to announce the signing of a strategic long-term agreement with Google...", "word_count": 664, "is_opinion": false, "twitter_account": "@YahooFinance", "all_links": [ "https://twitter.com/YahooFinance", "https://facebook.com/yahoofinance" // ... other links ], "all_domain_links": [ "rivals.com", "prnewswire.com" // ... other domain links ], "id": "a3b8452db164f7cb06c33ae305372115", "score": 14.821823 } ], "user_input": { "q": "renewable energy", "search_in": ["title_content"], "lang": ["en"], "from_": "2024-08-19T00:00:00", "to_": "2024-08-26T10:27:51.752362", "sort_by": "relevancy", "page": 1, "page_size": 1 // ... } } ``` This response shows you the rich data available for each article, including detailed metadata such as title, author, publication date, source information, and content. ### Step 4 - Modify the query for more specific results Now that you've seen the basic API response, let's modify our script to filter the results and include some advanced features. Update your script as follows: ```python import requests import json # Configuration API_KEY = "YOUR_API_KEY_HERE" # Replace with your actual API key URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "electric vehicles", "lang": "en", "countries": "US,GB", "from_": "7 days ago", "page_size": 10, "include_nlp_data": True } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Check if the request was successful # Parse and display the articles data = response.json() articles = data["articles"] print(f"Total hits: {data['total_hits']}") print(f"Page {data['page']} of {data['total_pages']}") print("---") for article in articles: title = article.get("title", "No Title") published_date = article.get("published_date", "No Date") sentiment = article["nlp"]["sentiment"]["content"] theme = article["nlp"]["theme"] print(f"Title: {title}") print(f"Published Date: {published_date}") print(f"Sentiment: {sentiment}") print(f"Theme: {theme}") print("---") except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` This modified script: 1. Searches for "electric vehicles". 2. Limits results to articles from the United States and Great Britain. 3. Retrieves articles from the last 7 days. 4. Returns ten articles per page. 5. Includes NLP data in the results. 6. Displays the title, publish date, content sentiment score, and theme for each article. Run the script again to see the filtered results with NLP data. ## What's next Now that you've made your first calls to the NewsCatcher News API v3 and explored some of its features, here are some next steps to enhance your usage: 1. Explore other parameters like `predefined_sources`, `sort_by`, `iptc_tags`, or `iab_tags` to refine your searches. 2. Check out our [API Reference](/v3/api-reference) to learn about all available endpoints and parameters. 3. Learn how to [Implement pagination](/v3/documentation/how-to/paginate-large-datasets) to handle large datasets. 4. Dive deeper into the [NLP features](/v3/documentation/guides-and-concepts/nlp-features) to extract insights from the news articles. If you have any questions or need assistance, don't hesitate to contact our support team. Happy news hunting! # Advanced querying techniques Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/advanced-querying Master advanced querying techniques to enhance the precision and relevance of your searches ## Overview Advanced querying techniques allow users to construct sophisticated search queries, enabling efficient retrieval of precise and relevant news content. These techniques are crucial for productive information extraction from the vast database of articles available through the API. ## Key querying techniques In the context of the NewsCatcher News API v3, advanced querying techniques refer to using specific syntax and operators within the `q` parameter to create complex, highly targeted search queries. These techniques include exact matching, boolean operators, wildcards, and proximity-based searching. ### Exact match ("double quotes") Use double quotes to search for an exact phrase or name. * Syntax: `"phrase or name"` * Example: `q="Tim Cook"` * Use case: Always use double quotes for company names, person names, and specific phrases. Without quotes, the query is treated as individual terms combined with the `AND` operator. For example, `q=Tim Cook` is equivalent to `q=Tim AND Cook`. #### Escaping double quotes in JSON queries To use exact match syntax in JSON queries, you must escape double quotes by adding a backslash `\` before each double quote. This ensures the JSON is valid and the query is interpreted correctly. **Example:** ```json { "q": "presidential election", "PER_entity_name": "\"Kamala Harris\" AND \"Donald Trump\"", "include_nlp_data": true } ``` Always use backslashes `\` before double quotes within query strings to maintain exact match syntax in JSON. ### Boolean operators #### `AND` * Ensures all specified terms are present in the text. * It's the default operator when multiple words are used without quotes. * Syntax: `term1 AND term2` or simply `term1 term2` * Example: `q=Microsoft AND Tesla` or `q=Microsoft Tesla` #### `OR` * Matches articles containing either of the specified terms. * Syntax: `term1 OR term2` * Example: `q=(Apple AND Cook) OR (Microsoft AND Gates)` #### `NOT` * Excludes articles containing the specified term. * Syntax: `term1 NOT term2` * Example: `q=Tesla NOT "Elon Musk"` ### Wildcards (\* and ?) * `*`: Matches any string of any length. * `?`: Matches any single character. * Example: `q=Microsoft AND C?O` (matches CEO, CFO, CTO, etc.) ### Proximity-based search (`NEAR`) The `NEAR` operator finds articles where specified terms appear close to each other. * Syntax: `NEAR("phrase_A", "phrase_B", distance, in_order)` * Example: `q=NEAR("browser", "Edge", 15)` * This finds articles where "browser" and "Edge" appear within 15 words of each other. * If the `in_order` parameter (default: false) can ensure phrase\_B appears after phrase\_A when set to true. Limitations: * Maximum 4 words per phrase * Maximum 3 phrases per NEAR operation * Maximum distance of 100 words ## Best practices 1. Start with broader searches and refine as needed. 2. Use double quotes for companies, person names, and specific phrases. 3. Use parentheses to group related terms in complex queries with boolean operators. 4. Utilize `NEAR` to find related terms within a specific context. 5. URL-encode the `q` parameter in `GET` requests to prevent issues with special characters. 6. Check the `user_input` field in the JSON response to confirm the correct interpretation of your keywords. 7. Combine multiple techniques for more precise results. ## Examples of complex queries * Market Research: ```bash q="artificial intelligence" AND (healthcare OR "medical research") AND NEAR("market growth", "emerging trends", 20) ``` * Competitive Analysis: ```bash q=("Apple" OR "Google") AND "smartphone market" AND NOT ("Samsung" OR "Huawei") ``` * Event Monitoring: ```bash q=("climate change" OR "global warming") AND (conference* OR summit) AND NEAR("Paris agreement", implementation, 15) ``` ## Comparison of techniques | Technique | Strengths | Limitations | Best For | | ----------------------------- | ------------------------------------- | ----------------------------- | ------------------------- | | Exact Match | Precise phrase matching | May miss relevant variations | Specific names or phrases | | Boolean Operators | Versatile, combines multiple concepts | Can become complex | Comprehensive searches | | Wildcards | Broadens search to include variations | Can return irrelevant results | Exploring related terms | | Proximity-Based Search (NEAR) | Finds related terms in context | Limited to 100-word distance | Concept relationships | ## Related resources * [How to use boolean operators](/v3/documentation/how-to/use-boolean-operators) * [Proximity search with NEAR](/v3/documentation/how-to/search-with-near) * [API Reference](/v3/api-reference/overview/introduction) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) # Articles deduplication Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/articles-deduplication Enhance search efficiency by filtering out duplicate articles. ## Introduction Ever feel like you're seeing the same news story over and over? You're not alone. In today's fast-paced digital world, major events often get covered by multiple outlets, resulting in a flood of similar articles. This can make it challenging to find unique, valuable content. That's where article deduplication comes in. It's a feature we've built into News API v3 to help you cut through the noise and get to the heart of the news. ## What is article deduplication? Article deduplication is a process that identifies and filters out nearly identical news articles from a large collection of content. It's smarter than simple text matching - it uses advanced language processing to recognize articles that say the same thing, even if they use different words or structures. Here's what article deduplication does: * Provides you with a diverse set of unique articles. * Reduces information overload by eliminating redundant content. * Improves the efficiency of content analysis and research. * Enhances your experience by presenting a more manageable amount of relevant information. ## How does it work? Our deduplication system uses a multi-step process to accurately identify duplicate articles while maintaining high precision. Let's break it down: ### Semantic similarity comparison First, we compare the meaning of articles: 1. We convert article texts into vector representations (embeddings) using our Natural Language Processing (NLP) pipeline. 2. These embeddings capture the meaning and relationships of the content. 3. We use cosine similarity to compare these embeddings, with a threshold of 0.95 to identify potential duplicates. This method catches similar articles even if they use different wording or sentence structures. It's great at identifying rewrites or articles from different sources covering the same event. ### Levenshtein distance analysis After the initial screening, we refine our process using the Levenshtein distance, which measures the minimum number of single-character edits needed to change one text into another. We use specific thresholds for accuracy: * 0.97 for article titles * 0.92 for article content This step helps us distinguish between articles that discuss similar topics differently and those that are true duplicates. It reduces the chance of false positives in our deduplication process. ### Identifying the original article Our system doesn't just spot duplicates - it also identifies which article is likely the original or most authoritative version. We use a scoring algorithm that considers factors like: * Domain credibility * Author's reputation * Publication timestamp The article with the highest score becomes the "parent" article. This status can change if we find a new duplicate with a higher score, reflecting the dynamic nature of news content. By identifying the original article, you get access to the most credible and comprehensive version of a story, while still filtering out redundant content. ### Continuous updates and historical lookup To keep our deduplication system effective, we continuously update our database and compare each new article against articles from the past seven days. This approach catches duplicates that may appear days after the original publication, accounting for delayed reporting or republishing content. By implementing this comprehensive process, News API v3 provides you with a rich, diverse, and non-redundant set of news articles, enhancing the overall quality and usability of the data. ## How to use deduplication Now that you understand how article deduplication works, let's look at how to use this feature in your News API v3 queries. ### Enable deduplication To exclude duplicate articles from your search results, set the `exclude_duplicates` parameter to `true` in your API request. Here's what you need to know: * The deduplication feature is available only for the [Search](/v3/api-reference/endpoints/search/search-articles-get) endpoint. * Deduplication supports [all languages available in News API v3](/v3/api-reference/overview/enumerated-parameters#language-lang-and-not-lang). ### Understand the API response When you enable deduplication, each article object in the response includes additional fields related to duplication: * `duplicate_count`: The number of duplicates associated with the article. * `duplicate_articles_group_id`: A unique identifier for the group of duplicates associated with the article. These fields give you valuable information about the extent of duplication for each article. You can use this data for further analysis or filtering in your application. ### Code example Here's an example of how to make a request using Python that includes the deduplication feature: ```python request.py import requests import json url = "https://v3-api.newscatcherapi.com/api/search" payload = json.dumps({ "q": "market value", "lang": "en", "theme": "Tech", "exclude_duplicates": True }) headers = { 'x-api-token': 'YOUR-API-KEY', 'Content-Type': 'application/json', 'Accept': 'application/json' } response = requests.request("POST", url, headers=headers, data=payload) print(response.text) ``` In the response, you'll find deduplication information for each article. Here's an example of what you might see: ```json response.json { "title": "Global Financial Services Industry - Insights Around Market Size, Key Trends And Forecast, 2024: Grand View Research, Inc.", ... "duplicate_count": 5, "duplicate_articles_group_id": "542def7ce3844c269d5f1a929309e6da" } ``` This shows that the article has five duplicates, which have been excluded from the results due to the `exclude_duplicates` parameter. ## Use cases Let's explore some key scenarios where deduplication can make a significant impact: * **Content curation and aggregation**: Ensure users of your news aggregator see a diverse range of articles without redundancy, improving user experience by reducing information overload and increasing the variety of perspectives presented. * **Media monitoring and analysis**: Focus on unique brand mentions or industry trends without getting overwhelmed by repetitive content, leading to more accurate sentiment analysis and trend identification. * **Research and trend analysis**: Streamline data collection for researchers and analysts by eliminating duplicate articles, making it easier to identify unique data points and trends for more accurate and efficient analysis. * **Personalized news feeds**: Enhance user engagement and satisfaction by ensuring readers see a wider variety of content rather than multiple versions of the same story in their personalized news feeds. ## Deduplication vs clustering While both deduplication and clustering organize and manage large sets of articles, they serve different purposes and are used in different contexts. ### Key differences * **Deduplication** identifies and removes nearly identical articles, preserving only the most relevant or original version. * **Clustering** groups similar articles together without removing any content, allowing you to see multiple perspectives on a topic. ### When to use deduplication vs clustering * Use **deduplication** to eliminate redundancy and present only unique content to your users. * Use **clustering** to group related articles together to show different angles or developments of a story over time. For more information on our clustering feature, check out [Clustering news articles](/v3/documentation/guides-and-concepts/clustering-news-articles). ## Related concepts * [Vector Embeddings](https://www.elastic.co/what-is/vector-embedding) * [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) * [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) # Breaking news (beta) Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/breaking-news Learn about breaking news detection and how to access emerging stories through the News API v3 Breaking news is currently in beta. We do not recommend using it in production as it may undergo changes. ## Overview Breaking news represents important news stories rapidly gaining traction across multiple sources. Our system identifies these stories through automated analysis of news coverage patterns, allowing you to discover high-impact events as they emerge. A breaking news event is characterized by: * Coverage from multiple different sources and authors * Rapid publication within a short time frame * Significant importance within a specific topic area ## How it works Our breaking news detection works through several interconnected stages: Recently published articles are automatically grouped together based on their content similarity. Each cluster undergoes analysis for breaking news signals, including: * Publication frequency (sudden spikes in coverage) * Source diversity (coverage across multiple publishers) * Coverage quality (presence of high-ranking news sources) The system filters out duplicate content within clusters while preserving source diversity. New candidates get compared against recent breaking news to track continuing stories. An AI analysis step confirms whether potential breaking news clusters represent significant events. Clusters that meet the breaking news criteria receive a special flag in the system. When you query the `/breaking_news` endpoint, you receive the most representative article from each breaking news cluster. Results are ordered by cluster size, so the most widely covered breaking news appears first. ## Breaking news endpoint The `/breaking_news` endpoint provides access to breaking news stories through simple API calls. ### Base URL ``` https://v3-api.newscatcherapi.com/api/breaking_news ``` ### HTTP Methods * `GET`: Query parameters in the URL * `POST`: Parameters in request body (JSON) ### Request examples ```bash GET curl -X GET "https://v3-api.newscatcherapi.com/api/breaking_news?theme=Business&include_nlp_data=true" \ -H "x-api-token: YOUR_API_KEY" ``` ```bash POST curl -X POST "https://v3-api.newscatcherapi.com/api/breaking_news" \ -H "Content-Type: application/json" \ -H "x-api-token: YOUR_API_KEY" \ -d '{ "include_nlp_data": true, "theme": "Business" }' ``` ### Response format The API returns a JSON object with the following structure: ```json { "status": "ok", "total_hits": 318, "page": 1, "total_pages": 318, "page_size": 1, "articles": [ { "title": "Article Title", "author": "Author Name", "published_date": "2025-03-31 14:11:04", // ... standard article fields ... "breaking_news_event_id": "12520934405688386290", "breaking_news_articles_count": 766 } ], "user_input": { // Echo of request parameters } } ``` Each article in the response represents the top article from a unique breaking news cluster. The additional fields specific to the `/breaking_news` endpoint include: * `breaking_news_event_id`: Unique identifier for the breaking news event/cluster. * `breaking_news_articles_count`: Number of articles in this breaking news cluster. ### Comparison with other endpoints | Endpoint | Breaking news | Search | Latest headlines | | --------------- | --------------------------------------------------- | ---------------------------------------------- | ------------------------------------------------------- | | Purpose | Discover emerging stories with significant traction | Find specific content matching search criteria | Retrieve recent headlines for the specified time period | | Time scope | Fixed at 24 hours | Configurable with `from_` and `to_` | Configurable with `when` | | Result grouping | Clustered by event | Individual articles | Individual articles | | Primary sorting | By cluster size (coverage volume) | By relevance, date, or rank | By publication date | | Best for | Discovering important new stories | Finding specific content | Monitoring regular updates | ## Use cases The breaking news endpoint is valuable in several scenarios: ### Timely information discovery Media monitoring teams can quickly identify emerging stories without manually tracking multiple sources. When a significant event occurs, the breaking news endpoint immediately highlights it based on its sudden surge in coverage. ### Content curation Content teams can discover trending stories to feature in newsletters, websites, or apps. Instead of manually checking multiple sources, the breaking news endpoint automatically identifies stories gaining traction across the media landscape. ### Market intelligence Financial and business analysts can be alerted to significant events affecting markets, companies, or industries. By monitoring breaking news in specific business sectors, analysts can identify potentially market-moving information faster. ### Crisis monitoring Organizations can quickly identify emerging crises related to their industry, brand, or interests. The breaking news endpoint helps detect sudden increases in coverage about topics of concern, enabling faster response. ## Related resources * [API reference: Get breaking news](/v3/api-reference/endpoints/breaking-news/retrieve-breaking-news-get) * [Clustering news articles](/v3/documentation/guides-and-concepts/clustering-news-articles) # Clustering news articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/clustering-news-articles Group similar articles together to reduce noise and gain insights ## Introduction Imagine walking into a massive library where all the books are scattered randomly across the floor. Finding related information would be a nightmare, right? That's often what it's like when dealing with large volumes of news data. Enter clustering - a powerful feature in News API v3 that acts like a team of lightning-fast librarians, instantly organizing articles into meaningful groups. Clustering is available for all languages supported by News API v3. This allows you to group similar articles across the entire multilingual database. ## What is clustering? Clustering is an advanced process that goes beyond simple keyword matching. It uses sophisticated language processing to understand the content and context of each article, grouping related pieces together even if they use different words to describe the same concepts. Here's what clustering does for you: * Reveals connections between articles, helping you spot trends and patterns in large volumes of news data. * Simplifies analysis of how different sources cover the same story. * Saves time by automatically organizing information into coherent groups. * Provides a clearer picture of the news landscape, making it easier to track evolving stories and identify emerging topics. By leveraging clustering in News API v3, you transform a chaotic flood of information into a structured, insightful resource, enabling more efficient and effective news analysis. ## How does it work? Our clustering system uses a streamlined process to group similar articles based on their semantic similarity. The clustering process occurs dynamically at the API level, taking into account the search filters you apply. This means you get clusters that are tailored to your specific query, not just generic groupings. ### Embeddings generation The foundation of our clustering process is the creation of article embeddings. These embeddings capture the semantic meaning of the content - not just the words but the ideas behind them. Think of these embeddings as creating a unique fingerprint for each article based on its content. ### Similarity calculation When you make a request that includes clustering, we use these pre-generated embeddings to group similar articles: 1. We compare the embeddings of different articles using cosine similarity. 2. This gives us a score that tells us how similar articles are in terms of their content and meaning. ### Cluster formation Based on the similarity scores, we form clusters: 1. Articles with a similarity score above our clustering threshold get grouped into clusters. 2. Each cluster gets a unique identifier, so you can easily refer to it later. ## How to use clustering Clustering is only available for the [Search](/v3/api-reference/endpoints/search/search-articles-get) and [Latest Headlines](/v3/api-reference/endpoints/latestheadlines/retrieve-latest-headlines-get) endpoints. ### Enable clustering To activate clustering and fine-tune its behavior, use the following parameters in your API request: * `clustering_enabled` (boolean): Set to `true` to enable clustering. * `clustering_threshold` (float): Determines how similar articles need to be to end up in the same cluster. Values range from 0 to 1, with higher values resulting in clusters with more similar articles. The default value is 0.6. * `clustering_variable` (string): Chooses which part of the article to use for clustering. Options are `content` (default), `title`, or `summary`. ### Optimize clustering with page size An important consideration when using clustering is the `page_size` parameter. Clustering operates on one page of results at a time, affecting how articles are grouped. To ensure the most effective clustering: * Set `page_size` to a value greater than your expected `total_hits`. * This allows all relevant articles to be considered for clustering together. For example, if your query is likely to return 150 articles, set `page_size` to at least 150. This prevents related articles from being split across different pages and, thus, different clusters. ### Understand API response When you enable clustering, your API response will include some new elements: * `clusters_count`: The total number of clusters found * `clusters`: An array of cluster objects, each containing: * `cluster_id`: A unique identifier for the cluster * `cluster_size`: The number of articles in the cluster * `articles`: An array of the articles in the cluster ### Code example Here's how you might use clustering in a Python script: ```python clustering.py import requests import os import json # Configuration API_KEY = "YOUR_API_KEY_HERE" # Replace with your actual API key URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "renewable energy", "lang": "en", "clustering_enabled": True, "clustering_variable": "content", "clustering_threshold": 0.9, } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Check if the request was successful # Print the raw JSON response print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` In the response, you'll see how articles are grouped into clusters. Here's a snippet of what you might get back: ```json response.json { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100, "clusters_count": 65, "clusters": [ { "cluster_id": "7222464423361803386", "cluster_size": 11, "articles": [ { "title": "Why Is NextEra Energy (NEE) The Best Alternative Fuel Stock To Buy Right Now?", "author": "Mashaid Ahmed", "published_date": "2024-08-25 17:36:01" // ... other article details ... } // ... other articles in the cluster ... ] } // ... other clusters ... ] } ``` This shows that the API found 65 clusters, with one cluster containing 11 articles about NextEra Energy and alternative fuel stocks. ## Use cases Clustering can be a game-changer in various scenarios. Here are some common use cases: * **Trend identification**: Quickly spot emerging trends by analyzing large clusters of articles on similar topics, giving you a bird's-eye view of the news landscape. * **Diverse perspectives analysis**: Examine how different sources cover the same story within a cluster, providing a comprehensive view of news events from various angles. * **Content organization**: Efficiently organize large volumes of news content into meaningful groups, as if you had a personal librarian instantly categorizing your articles. * **Story evolution tracking**: Follow how news stories develop over time by analyzing changes in cluster composition and size, watching stories grow, merge, or fade away in real-time. * **Enhanced search capabilities**: Improve search results by grouping related articles together, allowing users to quickly find relevant information with context-aware precision. ## Clustering vs deduplication While both clustering and deduplication help organize large sets of articles, they serve different purposes: | Feature | Clustering | Deduplication | | -------------------- | ------------------------------------------ | ------------------------------------------- | | Purpose | Groups similar articles | Removes nearly identical articles | | Content | Retains all articles | Removes duplicates | | Similarity Threshold | Generally lower, allowing broader groups | Higher, identifying near-exact matches | | Output | Groups of related articles | Set of unique articles | | Use Case | Analyzing related content, tracking trends | Eliminating redundancy, ensuring uniqueness | Choose clustering when you want to analyze related content and track trends. Go for deduplication to eliminate redundancy and ensure uniqueness in your article set. For more information on our deduplication feature, check out [Articles deduplication](/v3/documentation/guides-and-concepts/articles-deduplication). ## Wrapping up Clustering in News API v3 is like having a smart assistant that can quickly organize mountains of news data into meaningful groups. Whether you're tracking trends, analyzing diverse perspectives, or just trying to make sense of the news firehose, clustering can help you see the forest for the trees. We encourage you to try clustering in your News API v3 queries and see how it can enhance your news analysis. As always, we're here to help if you have any questions or need assistance using this feature. Happy clustering! # Custom tags Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/custom-tags Filter and classify news articles using your organization's unique taxonomy. ## Introduction Custom Tags in News API v3 lets you filter and classify articles using your unique taxonomy. This feature adapts to your specific terminology, enabling you to track industry trends, monitor events, and analyze specialized topics, ensuring you focus on what matters most. ## What is Custom Tags? Custom Tags is an advanced classification feature that applies your organization's unique taxonomy to news articles. It goes beyond simple keyword matching - it understands the context and meaning of articles and automatically categorizes them according to your specific classification system. Here's what Custom Tags can do for you: * Apply your organization's unique classification system to millions of news articles. * Filter news content using your specific taxonomy. * Get articles automatically tagged according to your classification needs. * Access custom-tagged content through a simple API interface. ## How does it work? Behind the scenes, Custom Tags uses a sophisticated machine learning pipeline to understand and classify news articles according to your taxonomy. Let's look at how we implement and maintain this system. We start with your taxonomy and tag definitions. Our engineering team works with you to learn about your domain specifics and collect extra context and examples. This research phase ensures our language models have the comprehensive understanding needed for accurate classification. We use a large language model (LLM) to create the initial classification system based on your enriched taxonomy. This phase involves prompt engineering and fine-tuning the model on a diverse dataset of news articles to ensure it understands the nuances of your classification requirements and can accurately apply your taxonomy. The solution is integrated into our NLP pipeline to classify all incoming news articles with your custom tags automatically. Your custom-built classification system is now ready and accessible via News API v3 by your API key. All articles processed since implementation remain available, ensuring comprehensive historical coverage. To maintain high accuracy, we continuously monitor the model's performance and collect feedback. Regular fine-tuning and retraining help the system stay current with evolving news trends and ensure consistent classification quality over time. ## API integration Custom Tags are integrated into the News API v3 and available across the following endpoints: * `/search` * `/latest_headlines` * `/authors` * `/search_similar` Each taxonomy is organization-specific and protected by your API key, ensuring your custom classification system remains secure and private. ### Request format To filter articles by your taxonomy tags, use the `custom_tags` parameter following this pattern: * `"custom_tags.taxonomy": "Tag1,Tag2,Tag3"`, where `taxonomy` is your taxonomy name and `Tag1,Tag2,Tag3` are specific tags. To specify multiple tags: * For `GET` requests, use a comma-separated string. * For `POST` requests, use a comma-separated string or an array of strings. Examples: ```python import requests url = "https://v3-api.newscatcherapi.com/api/search" headers = { "x-api-token": "YOUR_API_KEY" } # Using a comma-separated string payload_string = { "q": "*", "custom_tags.my_taxonomy": "Tag1,Tag2,Tag3" } # Using an array of strings payload_array = { "q": "*", "custom_tags.my_taxonomy": ["Tag1", "Tag2", "Tag3"] } response = requests.post(url, json=payload_array, headers=headers) ``` ### Response format When you use Custom Tags, each article in the response includes a `custom_tags` field with your taxonomy's classifications: ```json { "status": "ok", "total_hits": 1500, "articles": [ { "title": "Example Article Title", "custom_tags": { "my_taxonomy": ["Tag1", "Tag2"] } } ] } ``` The custom tags in the response are always returned as an array of strings, regardless of the request format used. ## Use cases Let's look at how different organizations might use Custom Tags: ### Content classification ```json { "custom_tags.content_type": ["Analysis", "Research", "Interview"] } ``` This helps content teams organize articles by their format and depth. ### Industry monitoring ```json { "custom_tags.industry": ["Manufacturing", "Supply Chain", "Logistics"] } ``` Perfect for tracking industry-specific news and developments. ## Best practices 1. **Taxonomy design** * Keep tags clear and unambiguous. * Use consistent naming conventions. * Consider hierarchical relationships. 2. **Query optimization** * Combine custom tags with other search parameters. * Use date ranges for temporal analysis. * Consider using multiple tags for comprehensive coverage. 3. **Integration tips** * Start with broader queries. * Use exact tag names (they're case-sensitive). * Test different tag combinations. ## See also * [NLP features](/v3/documentation/guides-and-concepts/nlp-features) * [API Reference](/v3/api-reference/endpoints/search/search-articles-get) # Entity disambiguation Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/entity-disambiguation Cut through the clutter with precision - ensure every article pinpoints the exact company or individual you're tracking. ## Introduction Ever tried tracking news about a company only to get flooded with articles about similarly named businesses? You're not alone. In today's complex media landscape, many companies share names, making it challenging to track the specific one you care about. That's where entity disambiguation comes in. ## What is entity disambiguation? Entity disambiguation is an advanced process that helps you cut through the noise and precisely identify the companies you want to track in news articles. The challenge isn't just about finding articles that mention a company name — it's about ensuring those articles are actually about your target company. Here's the core problem entity disambiguation solves: * Many companies share the same or similar names. * Unique identifiers like domain URLs, social media handles, or founder names aren't always mentioned. * Simple keyword searches can return a mix of relevant and irrelevant results. Entity disambiguation addresses these challenges by: * Using sophisticated language processing to accurately identify specific companies. * Leveraging multiple identifiers to confirm article relevance. * Providing flexible filtering options to match your precision needs. * Ensuring you get news only about the companies you care about. ## How does it work? Our entity disambiguation system uses a multi-step process to accurately identify company mentions in news articles. Let's break down how it works. ### Smart article retrieval The process begins with sophisticated article retrieval that combines: * Company's legal name * Website URL * Clean name from the Clearbit API For companies with names that might be common words, the system follows these steps: 1. Search the full company name over a week-long period. 2. If results exceed 10,000, categorize the name as a common word. 3. Limit searches to the `ner_ORG` field for better precision. For example, for a company named "Riot": ```json ORG_entity_name: "\"riot\" OR \"riot.com\" OR \"Riot\"" ``` ### Filter flag creation The system creates flags based on company identifiers found in articles: * `is_domain_present`: Company's domain URL mention * `is_company_name_present_in_title`: Company name in the article title * `is_company_name_present_in_ai_generated_summary`: Company name in AI summary * `is_alias_present_in_content`: Company aliases in content * `is_alias_present_in_title`: Company aliases in title * `founder_present`: Company founder mention * `founder_present_percent`: Percentage of founders mentioned ### Semantic similarity analysis A crucial component is calculating the similarity between article text and company descriptions: 1. Converts both company description and article sentences into vector embeddings. 2. Uses cosine similarity to measure semantic relatedness. 3. Produces similarity scores ranging from 0 (unrelated) to 1 (identical). This analysis adds fields to the `entity_disambiguation` object: * `average_cosine_similarity`: Mean similarity across all relevant sentences * `highest_cosine_similarity`: Highest similarity score found * `relevant_sentences`: Array of relevant sentences with similarity scores Higher cosine similarity scores indicate stronger semantic similarity to the company's description, helping prioritize the most relevant articles. ## Data structure and delivery ### Response format Each article object includes entity disambiguation data in this format: ```json { "title": "Implementing Neural Networks in TensorFlow (and PyTorch)", // ... (other standard article fields) "entity_disambiguation": { "average_cosine_similarity": 0.3923861011862755, "highest_cosine_similarity": 0.4942742586135864, "relevant_sentences": [ { "sentence": "TensorFlow is a comprehensive ecosystem of tools, libraries, and community resources for building and deploying machine learning applications.", "cosine_similarity": 0.4942742586135864 } // ... (other relevant sentences) ], "founder_present": null, "founder_present_percent": null, "is_domain_present": true, "is_company_name_present_in_title": true, "is_company_name_present_in_ai_generated_summary": true, "is_alias_present_in_content": true, "is_alias_present_in_title": true }, "company_name": "TensorFlow", "company_aliases": "TensorFlow", "cluster_id": "16552780591689057479" } ``` ### Delivery method Data is delivered via AWS S3 bucket dumps. By default, new "folders" (S3 key prefixes) are created daily, containing the latest data for all monitored companies. Customized delivery frequencies can be arranged to meet client needs. ## Use cases Entity disambiguation is particularly valuable for: * **Financial Services**: Get precise news about specific companies for informed investment decisions. * **Public Relations**: Track accurate media mentions of client companies. * **Legal and Compliance**: Monitor relevant news about specific client companies. * **Investment Firms**: Track news about portfolio companies without noise. * **Corporate Communications**: Monitor accurate media coverage. * **Regulatory Bodies**: Track specific companies under jurisdiction. ## Benefits | Benefit | Description | | ---------------------- | -------------------------------------------------------------- | | Improved Accuracy | Receive only relevant articles about your target companies. | | Time Savings | Eliminate manual sorting of ambiguous results. | | Customizable Filtering | Set your own criteria using the provided disambiguation flags. | | Comprehensive Coverage | Get complete news coverage without irrelevant content. | | Enhanced Analysis | Make better decisions based on clean, relevant data. | ## Related concepts * [Vector Embeddings](https://www.elastic.co/what-is/vector-embedding) * [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) * [Named Entity Recognition](https://en.wikipedia.org/wiki/Named-entity_recognition) # NLP features in News API v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/guides-and-concepts/nlp-features Familiarize yourself with the NLP features available in News API v3 This guide explains the Natural Language Processing (NLP) features available in News API v3, how to use them, and their practical applications. By leveraging these NLP capabilities, you can extract meaningful insights from news data, enhancing your analysis, research, and application development. ## Understanding NLP layer When processing news, we summarize the article content, categorize articles by theme, estimate the overall tone of the writing, and identify important names and places mentioned in the text. As a result, we supply each processed article with additional NLP information that you can use when making requests via News API v3. The NLP layer in News API v3 consists of the following components: | Component | Description | Plan Requirement | | -------------- | ------------------------------------------------------------------------ | ------------------- | | Theme | General topic or category of the article | v3\_nlp | | Summary | Concise overview of the article's content | v3\_nlp | | Sentiment | Separate scores for title and content sentiment | v3\_nlp | | Named Entities | Identified persons, organizations, locations, and miscellaneous entities | v3\_nlp | | IPTC Tags | Standardized news category tags | v3\_nlp\_iptc\_tags | | IAB Tags | Content categories for digital advertising | v3\_nlp\_iptc\_tags | | Custom Tags | Organization-specific classification system | All v3 NLP plans | | Embeddings | 1024-dimensional vector representation for semantic similarity | v3\_nlp\_embeddings | To learn more about plan features and requirements, see [Subscription plans](/v3/documentation/get-started/news-api-v3-subscription-plans). ### Including NLP data in API responses To control NLP data in your API responses, use the following parameters: * `include_nlp_data` (boolean): Set to `true` to include the NLP object for each article in response. * `has_nlp` (boolean): Set to `true` to filter the results to only articles with available NLP data. Some fields within the NLP object may be empty or `null` if specific analyses were not performed on the article. The full data is available for articles in English and Arabic only. #### How these parameters work together These parameters control both the article filtering and the inclusion of NLP data: * `include_nlp_data=true, has_nlp=false`: Returns all matching articles with the NLP object included in each. The completeness of NLP data varies by language. * `include_nlp_data=true, has_nlp=true`: Returns only articles processed with NLP. This combination filters out many articles in languages other than English and Arabic. * `include_nlp_data=false`: The NLP object is not included in the response, regardless of the `has_nlp` value. #### NLP coverage by language The table below shows which NLP features are available for different language categories: | Feature | English & Arabic | Other Languages | Coverage | | ------------------------ | ---------------- | --------------- | ---------------------------------- | | Theme classification | ✓ | ✓ (limited) | 10% of non-English/Arabic articles | | Summary | ✓ | ✓ (limited) | 10% of non-English/Arabic articles | | Sentiment analysis | ✓ | ✗ | 100% of English/Arabic articles | | Named entity recognition | ✓ | ✗ | 100% of English/Arabic articles | | Content tags | ✓ | ✓ (limited) | 10% of non-English/Arabic articles | | Vector embeddings | ✓ | ✓ | Nearly 100% of all articles | | Clustering | ✓ | ✓ | All articles | | Deduplication | ✓ | ✓ | All articles | For a complete list of supported languages, see [language codes in News API v3](/v3/api-reference/overview/enumerated-parameters#language-lang-and-not-lang). When working with non-English/Arabic content, using `has_nlp=true` substantially reduces the result set. ### Code example Here's how you can make a request to include NLP data in your search results using Python: ```python nlp_request.py import requests import json import os # Configuration API_KEY = os.getenv("NEWSCATCHER_API_KEY") # Set your API key as an environment variable if not API_KEY: raise ValueError("API key not found. Please set the NEWSCATCHER_API_KEY environment variable.") URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "artificial intelligence", "lang": "en", "include_nlp_data": True, "has_nlp": True } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Check if the request was successful # Print the raw JSON response print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` Here's a snippet of what you might see in the response, focusing on the NLP data for a single article: ```json response.json { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100, "articles": [ { "title": "Enterprise Artificial Intelligence Market Is Booming Worldwide with I Amazon Web Services, IBM Corporation", "author": "", "published_date": "2024-09-20 11:23:20", // ... other article details ... "nlp": { "theme": "Business, Tech", "summary": "The Enterprise Artificial Intelligence market size is estimated to increase by USD at a CAGR of 32.00% during the forecast period (2024-2030). The report includes historic market data from 2024 to 2030. The current market value is pegged at USD. The key players profiled in the report are Amazon Web Services, IBM Corporation, Microsoft Corporation, Oracle Corporation, Intel Corporation, Alphabet, SAP SE, C3.ai, Inc., DataRobot, Inc, Hewlett Packard Enterprise, Wipro Limited, and NVidia Corporat.", "sentiment": { "title": 0.9972, "content": 0.784 }, "ner_PER": [ { "entity_name": "Nidhi Bhawsar", "count": 1 } ], "ner_ORG": [ { "entity_name": "HTF MI", "count": 1 }, { "entity_name": "Amazon Web Services, Inc.", "count": 1 }, { "entity_name": "IBM Corporation", "count": 1 } // ... other organizations ... ], "ner_MISC": [ { "entity_name": "Artificial Intelligence", "count": 3 }, { "entity_name": "Enterprise Artificial Intelligence Market", "count": 1 } // ... other miscellaneous entities ... ], "ner_LOC": [ { "entity_name": "PUNE", "count": 1 }, { "entity_name": "MAHARASHTRA", "count": 1 }, { "entity_name": "INDIA", "count": 1 } ], "iptc_tags_name": [ "science and technology / technology and engineering / information technology and computer science / artificial intelligence", "economy, business and finance / products and services / business service / shipping and postal service" ], "iptc_tags_id": [ "20000763", "13000000", "20000291", "20000756", "20000209", "04000000", "20001371", "20001298" ], "iab_tags_name": [ "Technology & Computing / Artificial Intelligence", "Business and Finance / Business / Business I.T.", "Technology & Computing / Robotics" ] } } // ... other articles ... ] } ``` This response shows the rich NLP data available for each article, including theme classification, summary, sentiment analysis, named entity recognition, and content tagging. Let's examine each of these components. ## Theme classification Theme classification categorizes articles into predefined topics, allowing for efficient filtering and organization of news content. ### Available themes News API v3 supports the following themes: * `Business` * `Economics` * `Entertainment` * `Finance` * `Health` * `Politics` * `Science` * `Sports` * `Tech` * `Crime` * `Financial Crime` * `Lifestyle` * `Automotive` * `Travel` * `Weather` * `General` ### Filtering by theme Use the `theme` and `not_theme` parameters to filter articles based on their classified themes: * `theme` (string): Includes articles matching the specified theme(s). * `not_theme` (string): Excludes articles matching the specified theme(s). Example: ```json { "q": "electric vehicles", "theme": "Automotive,Tech", "not_theme": "Entertainment" } ``` This query returns articles about electric vehicles categorized under Automotive or Tech themes, excluding Entertainment. ## Article summarization Article summarization provides concise overviews of article content, allowing for quick understanding without reading the full text. ### Using summaries in searches and clustering You can use summaries in your searches and clustering: * In searches, use the `search_in` parameter: ```json { "q": "climate change", "search_in": "summary", "lang": "en" } ``` This query searches for `climate change` within article summaries, potentially yielding more relevant results than searching the full content. * For clustering, use summaries as the clustering variable: ```json { "q": "renewable energy", "clustering_enabled": true, "clustering_variable": "summary" } ``` This approach can lead to more concise and focused clusters. For more information on clustering, see [Clustering news articles](/v3/documentation/guides-and-concepts/clustering-news-articles). ## Sentiment analysis Sentiment analysis determines the emotional tone of an article. News API v3 provides sentiment scores for both the title and content, ranging from -1 (negative) to 1 (positive). ### Filtering by sentiment Filter articles based on sentiment scores using these parameters: * `title_sentiment_min` and `title_sentiment_max` (float): Filter by title sentiment * `content_sentiment_min` and `content_sentiment_max` (float): Filter by content sentiment Example: ```json { "q": "climate change", "content_sentiment_min": 0.2, "content_sentiment_max": 1.0, "lang": "en" } ``` This query returns articles about climate change with a positive content sentiment (scores between 0.2 and 1.0). ## Named Entity Recognition (NER) NER identifies and categorizes named entities within the text. News API v3 recognizes four types of entities: * `PER_entity_name` (string): Person names. * `ORG_entity_name` (string): Organization names. * `LOC_entity_name` (string): Location names. * `MISC_entity_name` (string): Miscellaneous entities cover named entities outside of the person, organization, or location categories, such as events, nationalities, products, and works of art. These parameters support boolean operators (`AND`, `OR`, `NOT`), proximity search with `NEAR`, and count-based filtering. Example of a NER query: ```json { "q": "tech industry", "ORG_entity_name": "Apple OR Microsoft", "PER_entity_name": "\"Tim Cook\" OR \"Satya Nadella\"", "include_nlp_data": true } ``` This query searches for articles about the tech industry that mention Apple or Microsoft as organizations and Tim Cook or Satya Nadella as persons. To learn more about NER, see [How to search by entity](/v3/documentation/how-to/search-by-entity). ## Tagging Content tagging provides a standardized categorization of news articles, enhancing searchability and enabling more precise content filtering. IPTC and IAB tags are available in the `v3_nlp_iptc_tags` plan. Custom tags are developed upon request and are available in all NLP plans. ### IPTC tags IPTC (International Press Telecommunications Council) tags are a standardized set of news categories. They offer a hierarchical classification system for news content. To filter articles by IPTC tags use the following parameters: * `iptc_tags` (string): Includes articles with specified IPTC tags. * `not_iptc_tags` (string): Excludes articles with specified IPTC tags. Example: ```json { "q": "artificial intelligence", "iptc_tags": "20000002", "lang": "en" } ``` This query searches for AI-related articles tagged with specific IPTC category, `20000002` encodes arts and entertainment. For a complete IPTC Media Topic NewsCodes list, visit the [IPTC website](https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html). ### IAB tags IAB (Interactive Advertising Bureau) tags provide a standardized taxonomy for digital advertising content. To filter articles by IAB tags use the following parameters: * `iab_tags` (string): Includes articles with specified IAB tags. * `not_iab_tags` (string): Excludes articles with specified IAB tags. Example: ```json { "q": "finance", "iab_tags": "Business,Investing", "not_iab_tags": "Personal Finance", "lang": "en" } ``` This query returns finance-related articles categorized under `Business` or `Investing` but not `Personal Finance`. For more information on IAB Content Taxonomy, visit the [IAB Tech Lab website](https://iabtechlab.com/standards/content-taxonomy/). ### Custom tags Custom tags help you classify and filter articles based on your organization's taxonomy. Each taxonomy is organization-specific and protected by your API key, ensuring your custom classification system remains secure and private. We develop and integrate this solution upon your request. Simply provide us with your tags and their descriptions. To filter articles by your taxonomy tags, use the `custom_tags` parameter following this pattern: * `"custom_tags.taxonomy": "Tag1,Tag2,Tag3"`, where `taxonomy` is your taxonomy name and `Tag1,Tag2,Tag3` are specific tags. To specify multiple tags: * For `GET` requests, use a comma-separated string. * For `POST` requests, use a comma-separated string or an array of strings. Example: ```json { "q": "market trends", "custom_tags.my_taxonomy": ["Tag1", "Tag2", "Tag3"], "lang": "en" } ``` For implementation details and examples, see [Custom tags](/v3/documentation/guides-and-concepts/custom-tags). ## Embeddings Vector embeddings provide a powerful way to represent article content as numerical vectors, enabling advanced semantic analysis and similarity comparisons. Available exclusively with the `v3_nlp_embeddings` plan, each article is processed through the [multilingual-e5-large model](https://huggingface.co/intfloat/multilingual-e5-large) to generate its vector representation. The embedding is available in the `new_embedding` field as an array of 1024 numbers. Here's an example of how it appears in the API response: ```json { "articles": [ { "title": "AI Breakthrough in Healthcare", "nlp": { "new_embedding": [0.023, -0.156, 0.789, ...], // 1024-dimensional vector // ... other NLP fields } } ] } ``` These high-dimensional vectors capture the semantic meaning of articles, enabling various advanced applications: * Semantic search: Find articles with similar meanings, not just matching keywords. * Content recommendation: Suggest related articles based on semantic similarity. * Topic clustering: Group articles by meaning using vector similarity. * Machine learning: Train models using these dense numerical representations. ## Use cases NLP features in News API v3 enable various applications across industries: | Application | Description | Example use case | | ------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------ | | Brand Monitoring | Track mentions, analyze sentiment and identify influencers. | A tech company monitoring public perception of their latest product launch. | | Competitive Intelligence | Monitor competitors' activities and public perception. | An automotive manufacturer tracking mentions of competitors' electric vehicle initiatives. | | Market Research | Analyze trends, consumer sentiment, and emerging topics. | A financial services firm identifying emerging fintech trends. | | Political Analysis | Track political figures and analyze public opinion. | A political campaign monitoring sentiment around key policy issues. | | Financial Analysis | Monitor market sentiment and track company mentions. | An investment firm analyzing sentiment around potential acquisition targets. | | Academic Research | Conduct large-scale analysis of media coverage. | A researcher studying media bias in climate change reporting. | | Content Curation | Automatically filter and categorize news content. | A news aggregator app personalizing content for users based on interests. | | Trend Forecasting | Identify emerging trends across industries. | A consulting firm predicting future technology adoption trends. | ## Best practices To maximize the effectiveness of NLP features in News API v3: * Start with broader queries and gradually refine using NLP parameters. * Combine multiple NLP parameters for precise results. * Use entity recognition with boolean operators to refine searches. * Experiment with sentiment thresholds to find the right balance for your use case. * Leverage theme classification and content tags to quickly filter large volumes of news data. * Regularly review and update your queries to adapt to changing news landscapes. ## Related resources * [How to use boolean operators](/v3/documentation/how-to/use-boolean-operators) * [Proximity search with NEAR](/v3/documentation/how-to/search-with-near) * [How to search by entity](/v3/documentation/how-to/search-by-entity) * [Clustering news articles](/v3/documentation/guides-and-concepts/clustering-news-articles) * [Articles deduplication](/v3/documentation/guides-and-concepts/articles-deduplication) # How to check sources coverage Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/check-sources-coverage Learn how to verify the news sources covered by News API v3 ## Overview Understanding the sources covered by News API v3 is crucial for ensuring the comprehensiveness and reliability of your news data analysis. This guide will walk you through the process of checking source coverage using the `/source` endpoint, including how to handle large numbers of sources efficiently. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * Basic knowledge of making API requests * Python or another tool for making HTTP requests (e.g., cURL, Postman, or a programming language with HTTP capabilities) ## Steps To make a valid request to the `/sources` endpoint, use at least one of the following key parameters: * `lang`: The language(s) of the sources. * `countries`: The countries where the news publishers are located. * `predefined_sources`: Predefined top sources per country. * `source_name`: Text to search within source names. You can specify any word, term, phrase, or outlet name. * `source_url`: The domain(s) of the news publication to search for. When specifying this parameter, you can only use `include_additional_info` as an extra parameter. To refine your search and obtain addional source info, you can use: * `include_additional_info`: Set to `true` to get extra details about each source. * `from_rank` and `to_rank`: Filter sources by their SEO rank. For detailed descriptions and usage of all parameters, refer to the [Sources reference documentation](/v3/api-reference/endpoints/sources/retrieve-sources-post). Here's a Python example demonstrating how to check for sources containing a specific word: ```python Python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/sources" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "source_name": "sport", "include_additional_info": True } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() data = response.json() print(json.dumps(data, indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch source information: {e}") ``` The API response will include information about the requested source(s). Here's an example response: ```json JSON { "message": "Maximum sources displayed according to your plan is set to 1000", "sources": [ { "name_source": "Sports Illustrated", "domain_url": "si.com", "logo": null, "additional_info": null }, { "name_source": "Sportskeeda", "domain_url": "sportskeeda.com", "logo": null, "additional_info": null } // ... other sources ], "user_input": { "lang": null, "countries": null, "predefined_sources": null, "include_additional_info": true, "from_rank": null, "to_rank": null, "source_name": ["sport"], "source_url": null } } ``` This response shows a list of sources that include "sport" in their names. Search by `source_name` does not perform an exact match and returns all sources that contain the specified term anywhere in their names. To check coverage for specific sources, you can use the `source_url` parameter for precise filtering: ```python Python PAYLOAD = { "source_url": ["si.com", "sportskeeda.com"], "include_additional_info": True } ``` You can also combine multiple parameters to narrow down your search. For example, to find sports-related sources in English-speaking countries: ```python Python PAYLOAD = { "source_name": "sport", "lang": "en", "countries": ["US", "GB", "AU"], "include_additional_info": True } ``` The response will include information for each covered source that matches your criteria. ## Handling large numbers of sources When dealing with a large number of sources, you may have them stored in a file (e.g., CSV or JSON). Here's an approach using asynchronous requests to check which sources are covered by News API v3 and identify those that are not. This implementation uses `aiohttp` and `asyncio` libraries for concurrent requests, providing better performance and scalability. The API key is stored in an environment variable for security purposes. Set your `NEWSCATCHER_API_KEY` in your environment before running the script. For example, in a Unix-like terminal (Linux or macOS), you can set it like this: ```bash export NEWSCATCHER_API_KEY="YOUR_API_KEY_HERE" ``` On Windows Command Prompt, you can use: ```cmd set NEWSCATCHER_API_KEY="YOUR_API_KEY_HERE" ``` ```python import aiohttp import asyncio import csv import logging from typing import List, Dict, Optional from tqdm.asyncio import tqdm import os # Constants API_KEY: str = os.getenv("NEWSCATCHER_API_KEY") if not API_KEY: raise EnvironmentError("API key not set in environment variables") URL: str = "https://v3-api.newscatcherapi.com/api/sources" HEADERS: Dict[str, str] = {"x-api-token": API_KEY} INPUT_FILE: str = "source_urls.csv" OUTPUT_FILE: str = "uncovered_sources.csv" MAX_CONCURRENT_REQUESTS: int = 5 # Set the desired number of concurrent requests MAX_RETRIES: int = 3 # Number of retries for failed requests TIMEOUT: int = 30 # Timeout for each request in seconds # Configure logging logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[ logging.FileHandler("check_sources.log"), # Log to a file logging.StreamHandler(), # Also log to console ], ) async def fetch_sources( session: aiohttp.ClientSession, batch: List[str], headers: Dict[str, str], retries: int = MAX_RETRIES, ) -> Optional[Dict]: """Fetch source coverage in a single batch with retry logic.""" payload: Dict[str, object] = {"source_url": batch, "include_additional_info": True} for attempt in range(retries): try: async with session.post( URL, headers=headers, json=payload, timeout=TIMEOUT ) as response: response.raise_for_status() data: Dict = await response.json() return data except (aiohttp.ClientError, asyncio.TimeoutError) as e: logging.error(f"Attempt {attempt + 1}/{retries} failed: {e}") await asyncio.sleep(2**attempt) # Exponential backoff except Exception as e: logging.error(f"Unexpected error: {e}") break return None async def fetch_sources_with_semaphore( semaphore: asyncio.Semaphore, session: aiohttp.ClientSession, batch: List[str], headers: Dict[str, str], ) -> Optional[Dict]: """Fetch sources with a semaphore to limit the number of concurrent requests.""" async with semaphore: return await fetch_sources(session, batch, headers) async def check_sources_concurrently( source_urls: List[str], batch_size: int = 1000 ) -> Dict[str, bool]: """Check if sources are covered by the API concurrently.""" coverage: Dict[str, bool] = {} semaphore: asyncio.Semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS) async with aiohttp.ClientSession( connector=aiohttp.TCPConnector(limit=MAX_CONCURRENT_REQUESTS) ) as session: tasks: List[asyncio.Task] = [] num_batches: int = ( len(source_urls) + batch_size - 1 ) // batch_size # Calculate the number of batches # Use tqdm.asyncio for async progress bar for i in range(0, len(source_urls), batch_size): batch: List[str] = source_urls[i : i + batch_size] task: asyncio.Task = fetch_sources_with_semaphore( semaphore, session, batch, HEADERS ) tasks.append(task) # Use tqdm to visualize the progress of batch processing results: List[Optional[Dict]] = await tqdm.gather( *tasks, desc="Checking source coverage", unit="batch" ) # Process results for i, result in enumerate(results): if isinstance(result, Exception): logging.error(f"Error processing batch {i+1}: {result}") continue if result: # Mark sources as covered for source in result.get("sources", []): coverage[source["domain_url"]] = True # Mark sources not in the response as uncovered batch = source_urls[i * batch_size : (i + 1) * batch_size] for url in batch: if url not in coverage: coverage[url] = False logging.info("Finished checking sources.") return coverage def read_sources_from_csv(file_path: str) -> List[str]: """Read source URLs from a CSV file.""" with open(file_path, "r") as file: reader = csv.reader(file) sources: List[str] = [ row[0] for row in reader if row ] # Assuming URLs are in the first column logging.info(f"Loaded {len(sources)} sources from {file_path}.") return sources def write_uncovered_sources(uncovered_sources: List[str], file_path: str) -> None: """Write uncovered sources to a CSV file.""" with open(file_path, "w", newline="") as file: writer = csv.writer(file) writer.writerow(["Uncovered Source URL"]) # Header for source in uncovered_sources: writer.writerow([source]) logging.info(f"Uncovered sources written to {file_path}.") def main() -> None: # Read sources from CSV source_urls: List[str] = read_sources_from_csv(INPUT_FILE) # Check sources coverage concurrently coverage: Dict[str, bool] = asyncio.run(check_sources_concurrently(source_urls)) # Identify uncovered sources uncovered_sources: List[str] = [ url for url, is_covered in coverage.items() if not is_covered ] # Write uncovered sources to CSV write_uncovered_sources(uncovered_sources, OUTPUT_FILE) logging.info(f"Total sources: {len(source_urls)}") logging.info(f"Covered sources: {len(source_urls) - len(uncovered_sources)}") logging.info(f"Uncovered sources: {len(uncovered_sources)}") if __name__ == "__main__": main() ``` This script does the following: 1. Reads source URLs from a CSV file. 2. Asynchronously checks the sources against the API in batches of 1000. 3. Identifies which sources are not covered by the API. 4. Writes the uncovered sources to a new CSV file. To use this script: 1. Save your list of source URLs in a CSV file named `source_urls.csv`. 2. Run the script. It will create a file named `uncovered_sources.csv` containing the URLs not covered by News API v3. You can then send the `uncovered_sources.csv` file to our support team. As News API v3 is a flexible, corporate-level solution, we can manually add the sources you need for your application. This script assumes that your source URLs are in the first column of the input CSV. Adjust the `read_sources_from_csv` function if your file has a different structure. ## Best practices Use the `include_additional_info` parameter to get insights into source reliability and output volume. Use `source_name` for a broader search and the `source_url` for more precise results. For large numbers of sources, use the batching method described above to stay within API limits. Remember that the API may limit the number of sources returned based on your subscription plan. ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) * [API Reference: Sources endpoint](/v3/api-reference/endpoints/sources/retrieve-sources-post) # How to optimize search with API parameters Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/optimize-search Discover how to combine API parameters to refine search queries and retrieve the most relevant articles from News API v3. Coming soon ... # How to paginate large datasets Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/paginate-large-datasets Efficiently retrieve and process large volumes of news data using pagination in News API v3 ## Overview When working with large datasets in News API v3, pagination is essential for efficiently retrieving and processing news articles. Pagination allows you to break down large result sets into smaller, manageable chunks, improving performance and reducing the load on both the client and server. News API v3 uses a cursor-based pagination system, which is ideal for handling large, dynamic datasets. This guide will walk you through the process of implementing pagination in your API requests. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * Basic knowledge of making API requests * Python or another tool for making HTTP requests (e.g., cURL, Postman, or a programming language with HTTP capabilities) Pagination is available on the following endpoints: * [Search](/v3/api-reference/endpoints/search/search-articles-get) * [Latest headlines](/v3/api-reference/endpoints/latestheadlines/retrieve-latest-headlines-get) * [Authors](/v3/api-reference/endpoints/authors/search-articles-by-author-get) * [Search by link](/v3/api-reference/endpoints/searchlink/search-articles-by-links-or-ids-get) * [Search similar](/v3/api-reference/endpoints/searchsimilar/search-similar-articles-get) * [Aggregation](/v3/api-reference/endpoints/aggregation/get-aggregation-count-by-interval-get) A single API response cannot return more than 1000 articles, so you should use pagination to retrieve larger datasets. ## Steps News API v3 uses two main parameters for pagination: * `page`: The page number you want to retrieve (default is 1, starts from 1). * `page_size`: The number of results per page (default is 100, range is 1 to 1000\). Start by setting up your basic query with pagination parameters. For example: ```json { "q": "artificial intelligence", "lang": "en", "page": 1, "page_size": 100 } ``` Here's a Python example demonstrating the initial request: ```python request.py import requests import json # Configuration API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = {"q": "artificial intelligence", "lang": "en", "page": 1, "page_size": 100} try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() data = response.json() if "total_pages" not in data or "articles" not in data: print(f"Unexpected response format: {data}") else: print(json.dumps(data, indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}, Status code: {response.status_code if response else 'N/A'}") ``` The API response includes several fields related to pagination: ```json { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100 } ``` * `total_hits`: The total number of articles matching your query. * `page`: The current page number. * `total_pages`: The total number of pages available. * `page_size`: The number of articles per page. To retrieve all pages, you'll need to loop through them. Here's an example of how to do this with enhanced error handling and exponential backoff: ```python paginate.py import requests import time import random API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "artificial intelligence", "lang": "en", "page_size": 1000, # Example with larger page size } def exponential_backoff(retries): time.sleep((2**retries) + random.uniform(0, 1)) def fetch_all_pages(): all_articles = [] page = 1 total_pages = None retries = 0 max_retries = 5 while total_pages is None or page <= total_pages: PAYLOAD["page"] = page try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() data = response.json() if "total_pages" not in data or "articles" not in data: print(f"Unexpected response format on page {page}: {data}") break # Use the total_pages directly from the API response if total_pages is None: total_pages = data["total_pages"] all_articles.extend(data["articles"]) print(f"Fetched page {page} of {total_pages}") if page >= total_pages: break page += 1 time.sleep(1) # Respect rate limits retries = 0 # Reset retries after a successful request except requests.exceptions.RequestException as e: print(f"Failed to fetch page {page}: {e}") retries += 1 if retries >= max_retries: print("Max retries reached, aborting.") break exponential_backoff(retries) return all_articles articles = fetch_all_pages() print(f"Total articles fetched: {len(articles)}") ``` ## Optimize requests To efficiently fetch large datasets while respecting API rate limits, use the following strategies: * Add delays between requests, such as a fixed sleep time, or implement an exponential backoff strategy for retries in case of failures (as shown in the previous example). * Fetch data in manageable batches to avoid memory issues with large datasets. * Use multithreading or asynchronous functions to speed up the process while respecting API subscription limits. Here is an example of asynchronous requests using `aiohttp` with concurrency, a retry mechanism, and logging: ```python paginate_async.py import aiohttp import asyncio import os import logging import json from typing import Dict, List, Optional, Any from tqdm.asyncio import tqdm # Constants API_KEY: str = os.getenv("NEWSCATCHER_API_KEY") if not API_KEY: raise EnvironmentError("API key not set in environment variables") URL: str = "https://v3-api.newscatcherapi.com/api/search" HEADERS: Dict[str, str] = {"x-api-token": API_KEY} PAYLOAD: Dict[str, Any] = { "q": "artificial intelligence", "lang": "en", "page_size": 100, } MAX_CONCURRENT_REQUESTS: int = 5 # Set the desired number of concurrent requests MAX_RETRIES: int = 3 # Number of retries for failed requests TIMEOUT: int = 30 # Timeout for each request in seconds OUTPUT_FILE: str = "fetched_articles.json" # Configure logging logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[ logging.FileHandler("fetch_articles.log"), # Log to a file logging.StreamHandler(), # Also log to console ], ) async def fetch_page( session: aiohttp.ClientSession, url: str, headers: Dict[str, str], payload: Dict[str, Any], timeout: int = TIMEOUT, max_retries: int = MAX_RETRIES, ) -> Optional[Dict]: """Fetch a single page with retry logic.""" retries: int = 0 while retries < max_retries: try: async with session.post( url, headers=headers, json=payload, timeout=timeout ) as response: response.raise_for_status() return await response.json() except aiohttp.ClientResponseError as e: retries += 1 except aiohttp.ClientConnectionError: retries += 1 except asyncio.TimeoutError: retries += 1 except aiohttp.ClientError as e: retries += 1 return None # Explicitly return None if all retries fail async def fetch_page_with_semaphore( semaphore: asyncio.Semaphore, session: aiohttp.ClientSession, url: str, headers: Dict[str, str], payload: Dict[str, Any], ) -> Optional[Dict]: """Fetch a page using a semaphore to limit concurrent requests.""" async with semaphore: return await fetch_page(session, url, headers, payload) async def fetch_all_pages_concurrently( url: str, headers: Dict[str, str], initial_payload: Dict[str, Any], max_concurrent_requests: int = MAX_CONCURRENT_REQUESTS, ) -> List[Dict[str, Any]]: """Fetch all pages concurrently using a semaphore to limit the number of requests.""" semaphore: asyncio.Semaphore = asyncio.Semaphore(max_concurrent_requests) all_articles: List[Dict[str, Any]] = [] async with aiohttp.ClientSession( connector=aiohttp.TCPConnector(limit=10) ) as session: # Fetch the first page to determine the total number of pages async with semaphore: initial_payload["page"] = 1 data: Optional[Dict[str, Any]] = await fetch_page( session, url, headers, initial_payload ) if not data or "total_pages" not in data or "articles" not in data: return [] total_pages: int = data["total_pages"] all_articles.extend(data["articles"]) # Prepare tasks for fetching all remaining pages tasks: List[asyncio.Task] = [] for page in range(2, total_pages + 1): payload: Dict[str, Any] = initial_payload.copy() payload["page"] = page task: asyncio.Task = fetch_page_with_semaphore( semaphore, session, url, headers, payload ) tasks.append( asyncio.wait_for(task, timeout=60) ) # Increased timeout for tasks # Execute all tasks concurrently and gather results with tqdm progress bar results: List[Optional[Dict[str, Any]]] = await tqdm.gather( *tasks, desc="Fetching pages", unit="page", initial=1, total=total_pages ) # Process results for result in results: if result and "articles" in result: all_articles.extend(result["articles"]) return all_articles def main() -> None: """Main function to execute pagination and fetch all articles.""" # Fetch all articles articles: List[Dict[str, Any]] = asyncio.run( fetch_all_pages_concurrently(URL, HEADERS, PAYLOAD, MAX_CONCURRENT_REQUESTS) ) # Save fetched data to a JSON file with open(OUTPUT_FILE, "w", encoding="utf-8") as f: json.dump(articles, f, ensure_ascii=False, indent=2) logging.info(f"Total articles fetched: {len(articles)}") logging.info(f"Data saved to {OUTPUT_FILE}") if __name__ == "__main__": main() ``` ## Best practices * Use smaller page sizes (e.g., 20-50) for faster initial load times in user interfaces. * Use larger page sizes (up to 1000) for batch processing or to retrieve the entire dataset. * Be aware that the dataset may change between requests, especially for queries on recent news. * Implement error handling and retries to make your pagination code more robust. * Consider implementing a way to resume pagination from a specific page in case of interruptions. * When using multithreading or async functions, carefully manage concurrency to stay within your API usage limits. ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) # How to retrieve more than 10,000 articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/retrieve-more-than-10k-articles Learn how to use time-chunking methods in the Python SDK to retrieve large volumes of articles The Newscatcher API limits results to 10,000 articles per search query. The Python SDK provides special methods that automatically split your search across multiple time periods to bypass the limit and retrieve all articles relevant to your query. These advanced retrieval methods are available only in the Python SDK. ## Understanding the article limit When your query matches more than 10,000 articles, the API returns `"total_hits": 10000` as a hard limit, and you cannot retrieve more through standard pagination. ```python from newscatcher import NewscatcherApi client = NewscatcherApi(api_key="YOUR_API_KEY") response = client.search.post( q="technology", from_="7d", to="now" ) print(f"Total hits: {response.total_hits}") print(f"Is result capped: {response.total_hits == 10000}") # True if limit reached ``` ## Using time-chunking methods The SDK provides two special methods to retrieve large volumes of articles: * `get_all_articles` * `get_all_headlines` Both methods available for synchronous and asynchronous clients. ### Get all articles ```python Synchronous from newscatcher import NewscatcherApi client = NewscatcherApi(api_key="YOUR_API_KEY") articles = client.get_all_articles( q="renewable energy", from_="30d", to="now", time_chunk_size="1d", max_articles=50000, show_progress=True, ) print(f"Retrieved {len(articles)} articles") ``` ```python Asynchronous import asyncio from newscatcher import AsyncNewscatcherApi async def get_articles(): client = AsyncNewscatcherApi(api_key="YOUR_API_KEY") articles = await client.get_all_articles( q="electric vehicles", from_="30d", to="now", time_chunk_size="1d", max_articles=50000, concurrency=3, show_progress=True ) print(f"Retrieved {len(articles)} articles") articles = asyncio.run(get_articles()) ``` ### Get all headlines ```python Synchronous headlines = client.get_all_headlines( when="30d", time_chunk_size="1d", max_articles=20000, show_progress=True ) print(f"Retrieved {len(headlines)} headlines") ``` ```python Asynchronous import asyncio from newscatcher import AsyncNewscatcherApi async def get_headlines(): client = AsyncNewscatcherApi(api_key="YOUR_API_KEY") headlines = await client.get_all_headlines( when="30d", time_chunk_size="1d", max_articles=20000, concurrency=3, show_progress=True ) print(f"Retrieved {len(headlines)} headlines") headlines = asyncio.run(get_headlines()) ``` ## How time-chunking works Time-chunking divides your date range into smaller intervals, making separate API calls for each period and combining the results. Each interval can return up to 10,000 articles. For example, with `time_chunk_size="1d"` over 5 days, the method makes 5 API calls, one for each day, with auto pagination to potentially retrieve up to 50,000 articles. Time-chunking diagram showing how multiple requests are combined ## Choosing the right chunk size The optimal chunk size depends on how many articles your query returns: | Query type | Articles per day | Recommended chunk size | | --------------- | -------------------- | ---------------------- | | Extremely broad | 10,000+ per hour | `"1h"` | | Very broad | 10,000+ per day | `"6h"` | | Broad | 3,000-10,000 per day | `"1d"` | | Moderate | 1,000-3,000 per day | `"3d"` | | Specific | 100-1,000 per day | `"7d"` | | Very specific | \< 100 per day | `"30d"` | ```python response = client.search.post( q="renewable energy", from_="1d", to="now" ) print(f"Articles for one day: {response.total_hits}") if response.total_hits == 10000: response_6h = client.search.post( q="renewable energy", from_="6h", to="now" ) print(f"Articles for 6 hours: {response_6h.total_hits}") ``` ## Method parameters Your search query. Supports AND, OR, NOT operators and advanced syntax. Starting date for `get_all_articles` (e.g., `"10d"` or `"2023-03-15"`). Ending date for `get_all_articles` defaults to current time. Time range for `get_all_headlines` (e.g., `"1d"` or `"2023-03-15"`). Chunk size: `"1h"`, `"6h"`, `"1d"`, `"7d"`, `"1m"`. Maximum number of articles to retrieve. Whether to display a progress bar. Whether to remove duplicate articles. For async methods only: number of concurrent requests. ## Common issues and solutions If you hit rate limits: * Reduce concurrency (for async methods). * Add longer delays between requests. * Break large requests into smaller batches. If you run out of memory: * Reduce `max_articles` parameter. * Process data in smaller batches. * Save results incrementally as shown in the advanced example. * Release memory with `del` and `gc.collect()`. If results are incomplete: * Check if your chunk size is appropriate. * Ensure your date range is correct. * Verify your query syntax is valid. * Make sure you're not hitting the 10,000 limit per chunk. ## See also * [Advanced query syntax](/v3/documentation/guides-and-concepts/advanced-querying) * [API rate limits](/v3/api-reference/overview/rate-limits) * [Python SDK](https://github.com/Newscatcher/newscatcher-python) # How to search articles by entity Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/search-by-entity Find articles mentioning specific people, organizations, locations, or other named entities ## Overview Empowered by Named Entity Recognition (NER), News API v3 lets you find articles mentioning specific people, organizations, locations, or other named entities. With NER, you can perform more precise and relevant searches than with simple keyword queries. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * NLP functionality enabled in your subscription plan * Basic knowledge of making API requests * Python or another tool for making HTTP requests (e.g., cURL, Postman, or a programming language with HTTP capabilities) ## Steps News API v3 supports four main entity types that reflect the following query parameters: * `PER_entity_name`: Person names * `ORG_entity_name`: Organization names * `LOC_entity_name`: Location names * `MISC_entity_name`: Miscellaneous names (products, events, nationalities, etc.) Let's start with a simple entity search for a tech company: ```json { "q": "tech industry", "ORG_entity_name": "OpenAI" } ``` This query searches for articles about the tech industry that mention OpenAI as an organization. You can expand your search to include multiple entities: ```json { "q": "tech industry", "ORG_entity_name": "OpenAI AND Microsoft" } ``` To include entity information in the API response, use the `include_nlp_data` parameter: ```json { "q": "tech industry", "ORG_entity_name": "OpenAI AND Microsoft", "include_nlp_data": true } ``` To learn more about NLP capabilities in News API v3, see [NLP features](/v3/documentation/guides-and-concepts/nlp-features). Here's a Python example demonstrating a search request using named entities: ```python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "tech industry", "ORG_entity_name": "OpenAI AND Microsoft", "include_nlp_data": True, "lang": "en" } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` The API returns a JSON response. Here's a simplified example focusing on entity-related fields: ```json { "status": "ok", "total_hits": 325, "page": 1, "total_pages": 4, "page_size": 100, "articles": [ { "title": "OpenAI Backs California Bill for Labeling AI-Generated Content Amid Industry Debate", "author": "Reiser X", "published_date": "2024-08-26", "link": "https://reiserx.medium.com/openai-backs-california-bill-for-labeling-ai-generated-content-amid-industry-debate-0863d5efa010", "domain_url": "medium.com", "description": "OpenAI, a leading artificial intelligence (AI) research lab, has recently voiced its support for California's Assembly Bill 3211 (AB 3211)…", "content": "OpenAI has voiced its support for California's Assembly Bill 3211, which mandates labeling of AI-generated content. This bill, aimed at transparency, has divided the tech industry, with OpenAI backing it and companies like Microsoft opposing it...", "word_count": 860, "nlp": { "theme": "Tech, Business", "summary": "OpenAI supports California's Assembly Bill 3211 (AB 3211), a legislative proposal that mandates the labeling of AI-generated content. The tech industry is split on the issue of AI regulation, with Microsoft and others opposed to the bill...", "sentiment": { "title": 0.0, "content": 0.0 }, "ner_PER": [], "ner_ORG": [ { "entity_name": "OpenAI", "count": 8 }, { "entity_name": "Microsoft", "count": 2 } // ... other entities ], "ner_MISC": [ { "entity_name": "AB 3211", "count": 7 } // ... other entities ], "ner_LOC": [ { "entity_name": "California", "count": 2 } // ... other entities ], "iptc_tags_name": [ "economy, business and finance / products and services / media and entertainment industry / streaming service" // ... other tags ], "iab_tags_name": [ "Technology & Computing / Artificial Intelligence" // ... other tags ] } // ... other fields } // ... other articles ], "user_input": { "q": "tech industry", "search_in": ["title_content"], "lang": ["en"], "from_": "2024-08-26T00:00:00", "to_": "2024-09-02T17:50:45.314945", "include_nlp_data": true, "ORG_entity_name": "OpenAI AND Microsoft" // ... other input parameters } } ``` Use the COUNT functionality to filter based on entity mention frequency: ```json { "q": "tech industry", "ORG_entity_name": "COUNT(\"Apple\", 2, \"gt\") OR COUNT(\"Microsoft\", 2, \"gt\")", "include_nlp_data": true } ``` Use different entity types together: ```json { "q": "tech industry", "ORG_entity_name": "Apple OR Microsoft", "PER_entity_name": "\"Tim Cook\" OR \"Satya Nadella\"", "include_nlp_data": true } ``` Combine entity search with boolean operators for more complex queries: ```json { "q": "tech industry AND (innovation OR \"artificial intelligence\")", "ORG_entity_name": "(Apple OR Microsoft) AND NOT Google", "include_nlp_data": true } ``` Use the NEAR operator to find articles where terms are mentioned in proximity to each other and combine with entity search: ```json { "q": "NEAR(\"tech industry\", \"innovation\", 10)", "ORG_entity_name": "Apple OR Microsoft", "include_nlp_data": true } ``` Always use backslashes `\` before double quotes within query strings to maintain exact match syntax in JSON. ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to use boolean operators](/v3/documentation/how-to/use-boolean-operators) * [How to perform proximity search with NEAR](/v3/documentation/how-to/search-with-near) * [Understanding the NLP features](/v3/documentation/guides-and-concepts/nlp-features) # How to search articles by URL Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/search-by-url Find articles that mention specific URLs or domains using News API v3 ## Overview News API v3 offers powerful URL-based search capabilities, allowing you to find articles that mention specific URLs or domains. This feature is handy for tracking website mentions, monitoring competitors, or analyzing link patterns in news articles. The API provides two main parameters for URL-based searches: * `all_links`: Searches for articles mentioning specific complete URLs. * `all_domain_links`: Searches for articles mentioning specific domain URLs. Using these parameters, you can refine your searches and gain valuable insights from news articles on the web. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * Basic knowledge of making API requests * Python or another tool for making HTTP requests (cURL, Postman, or a programming language with HTTP capabilities) URL-based search is available on the following endpoints: * `/search` * `/latest_headlines` * `/authors` * `/search_similar` * `/aggregation_count` In `GET` requests, you can specify multiple URLs/domains as comma-separated strings. `POST` requests support both comma-separated strings and arrays of strings. ## Steps * `all_links`: Use this parameter to search for articles that mention specific complete URLs. This is useful when you want to find articles linking to exact pages. * `all_domain_links`: Use this parameter to search for articles that mention specific domain URLs. This is helpful when you want to find articles linking to any page within a domain. Here's an example of a basic query using the `all_domain_links` parameter: ```json { "q": "AI", "all_domain_links": "nvidia.com", "lang": "en" } ``` This query: * Searches for articles about AI. * Looks for mentions of the NVIDIA website. * Limits results to English language articles. Here's a Python example demonstrating how to make a POST request with the above query: ```python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "AI", "all_domain_links": "nvidia.com", "lang": "en" } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` The API returns a JSON response. Here's a simplified example focusing on URL-related fields: ```json { "status": "ok", "total_hits": 264, "page": 1, "total_pages": 3, "page_size": 100, "articles": [ { "title": "NVIDIA 'Powering Advanced AI' Is The New Tagline For GeForce RTX GPUs & AI PC Platforms", "author": "Hassan Mujtaba", "published_date": "2024-09-02", "link": "https://wccftech.com/nvidia-powering-advanced-ai-new-tagline-geforce-rtx-gpus-ai-pc-platforms", "domain_url": "wccftech.com", "description": "NVIDIA has silently updated its GeForce RTX GPUs & AI PC platform badges to include a new tagline which is \"Powering Advanced AI\".", "content": "NVIDIA has updated its GeForce RTX GPU badges with a new tagline, 'Powering Advanced AI,' to highlight its AI capabilities. The company continues to lead in AI applications with technologies like DLSS and ChatRTX. This new branding is already being used by OEMs, showcasing NVIDIA's AI performance in various sectors beyond gaming...", "word_count": 341, "all_links": [ "https://www.nvidia.com", "https://www.amazon.com/dp/B082L36ZRY?tag=twea-20", "https://www.facebook.com/TweakTown", "https://www.threads.net/@tweaktown" // ... other links ], "all_domain_links": [ "techpowerup.com", "threads.net", "youtube.com", "linksynergy.com", "nvidia.com" // ... other domain links ] // ... other fields } // ... other articles ], "user_input": { "q": "AI", "lang": ["en"], "all_domain_links": ["nvidia.com"] // ... other fields } } ``` To improve your search results and gain more specific insights, consider these practical examples: ```json { "q": "AI report", "all_links": [ "https://aiindex.stanford.edu/report/", "https://www.stateof.ai/", "https://www2.deloitte.com/us/en/pages/consulting/articles/state-of-generative-ai-in-enterprise.html" ], "from_": "2024-01-01", "lang": "en" } ``` This query tracks mentions of specific AI reports, helping you understand their impact and how they're being discussed in the media. ```json { "q": "AI AND (healthcare OR medicine)", "all_domain_links": ["who.int", "nih.gov"], "all_links": ["https://www.nature.com/articles/s41591-023-02448-8"], "lang": "en", "from_": "2024-01-01" } ``` This query combines industry-specific keywords with relevant domain links and a specific research article (Large language models in medicine) to track AI's impact in healthcare. ```json { "q": "AI regulation", "all_domain_links": ["europa.eu", "whitehouse.gov", "gov.uk"], "lang": "en", "from_": "2024-01-01" } ``` This query helps monitor AI-related policy discussions and announcements from major government entities. ## Best practices * Use `all_links` to find mentions of specific pages or articles. * Use `all_domain_links` to track mentions of a website in general, regardless of the specific page. * Combine URL search with other parameters to create more targeted queries. * Use date ranges to focus on recent developments or track changes over time. * Be aware that very broad domain searches might return a large number of results. Use additional filters to narrow down your search. * Regularly update your queries to track evolving topics and include new relevant URLs or domains. ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) * [API Reference: Search endpoint](/v3/api-reference/endpoints/search/search-articles-post) # Proximity search with NEAR Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/search-with-near Find articles with related terms in close proximity ## Overview Proximity search is a powerful technique that lets you find articles where specific terms appear close to each other. This method is particularly useful to ensure that related concepts are discussed in the same context within an article. News API v3 supports proximity search through the `NEAR` operator. This operator lets you specify two or more terms and the maximum number of words that can appear between them. By using the `NEAR` operator, you can significantly improve the relevance of your search results, finding articles where related terms are discussed in close proximity to each other. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * Basic knowledge of making API requests * Python or another tool for making HTTP requests (cURL, Postman, or a programming language with HTTP capabilities) ## Steps The `NEAR` operator uses the following format: ``` NEAR("phrase_A", "phrase_B", distance, in_order); ``` * `phrase_A` and `phrase_B`: The terms or phrases you want to find near each other (max 4 words per phrase). * `distance`: The maximum number of words that can appear between the phrases (max 100). * `in_order`: Optional boolean parameter. If true, `phrase_B` must appear after `phrase_A`. Defaults to false. Here are some examples of `NEAR` queries: * `NEAR("climate change", "renewable energy", 15)` * `NEAR("artificial intelligence", "healthcare", 20, true)` Note the following limitations: * Maximum 4 words per phrase * Maximum 3 phrases per `NEAR` operation * Maximum distance of 100 words Use the `/search` endpoint with your constructed query in the `q` parameter. Here's a Python code example: ```python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": 'NEAR("climate change", "renewable energy", 15)', "lang": "en", } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` The API returns a JSON response similar to other search queries. Here's an example of what you might see: ```json { "status": "ok", "total_hits": 635, "page": 1, "total_pages": 7, "page_size": 100, "articles": [ { "title": "We Must Adopt Renewable Energy To Combat Climate Change", "author": null, "published_date": "2024-09-01", "link": "https://ournaijanews.com/we-must-adopt-renewable-energy-to-combat-climate-change-sanusi", "domain_url": "ournaijanews.com", "description": "Emir Sanusi II advocates for renewable energy adoption to combat climate change, emphasizing its importance for environmental and human health.", "word_count": 453 // ... other article fields } // ... other articles ], "user_input": { "q": "NEAR(\"climate change\", \"renewable energy\", 15)", "lang": ["en"], "from_": "2024-08-26T00:00:00", "to_": "2024-09-02T10:32:59.545085", "sort_by": "relevancy", "page": 1, "page_size": 100 // ... other input parameters } } ``` If you're not getting the desired results, try adjusting your query: * Change the distance parameter to narrow or broaden your search * Add more terms to the `NEAR` operator (up to 3 phrases) * Combine `NEAR` with boolean operators for more complex queries. For example: ``` NEAR("electric vehicles", "battery technology", 20) AND NOT "Tesla" ``` ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to use boolean operators](/v3/documentation/how-to/use-boolean-operators) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) # How to use boolean operators Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/how-to/use-boolean-operators Refine queries with AND, OR, and NOT ## Overview Boolean operators are powerful tools that allow you to create more complex and precise search queries. These operators help you filter news articles to find the content you need. News API v3 supports three main boolean operators: * `AND`: Ensures both terms are present in the search results. * `OR`: Allows for either term to be present. * `NOT`: Excludes terms from the search results. Additionally, you can use parentheses `()` to group terms and create more complex queries. By combining these operators, you can significantly improve the relevance of your search results and find the most pertinent news articles for your needs. ## Before you start Before you begin, ensure you have: * An active API key for NewsCatcher News API v3 * Basic knowledge of making API requests * Python or another tool for making HTTP requests (e.g., cURL, Postman, or a programming language with HTTP capabilities) ## Steps 1. Construct your query using boolean operators. When forming your query, use parentheses to group terms and operators. Here are some examples: * `(bitcoin OR cryptocurrency) AND (investment OR trading)` * `"electric cars" NOT Tesla` * `("artificial intelligence" OR AI) AND healthcare NOT (Google OR Amazon OR Microsoft)` 2. Make an API request with your boolean query. Use the `/search` endpoint with your constructed query in the `q` parameter. Here is a code example in Python: ```python request.py import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://v3-api.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY} PAYLOAD = { "q": "(bitcoin OR cryptocurrency) AND (investment OR trading)", "lang": "en", } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` 3. Analyze the results. The API returns a JSON response with the following structure: ```json response.json { "status": "ok", "total_hits": 10000, "page": 100, "total_pages": 10000, "page_size": 100, "articles": [ { "title": "$100 Million in Bitcoin: Leading the Charge in Institutional Cryptocurrency Adoption", "author": "Uzair Hasan", "authors": [ "Uzair Hasan", "Tech Bullion", "Angela Scott-Briggs", "Busines News Wire", "Businesnews Wire" ], "journalists": ["Angela Scott-Briggs", "Uzair Hasan"], "published_date": "2024-08-22 18:32:13", "published_date_precision": "full", "updated_date": "2024-08-22 18:32:13", "updated_date_precision": "full", "link": "https://techbullion.com/100-million-in-bitcoin-leading-the-charge-in-institutional-cryptocurrency-adoption", "domain_url": "techbullion.com", "full_domain_url": "techbullion.com", "name_source": "TechBullion", "is_headline": true, "paid_content": false, "parent_url": "https://techbullion.com", "country": "US", "rights": "techbullion.com", "rank": 6999, "media": "https://techbullion.com/wp-content/uploads/2024/08/bit-1000x600.jpg", "language": "en", "description": "In a decisive move, tradetide.net has announced a substantial investment of $100 million in Bitcoin. This bold step underscores tradetide.net's confidence in Bitcoin's long-term potential, positioning…", "content": "In a decisive move, tradetide.net has announced a substantial investment of $100 million in Bitcoin. This bold step underscores tradetide.net's confidence in Bitcoin's long-term potential, positioning the firm at the forefront of the digital asset revolution...", "word_count": 556, "is_opinion": false, "twitter_account": "@TechBullion", "all_links": [ "http://feeds.feedburner.com/Techbullion", "https://www.facebook.com/TechBullion" // ... other links ], "all_domain_links": [ "businesnewswire.com", "tradetide.net" // ... other domains ], "id": "5f839cd4bfddbd275421b5bc6fff68a4", "score": 26.076221 } // ... other articles ], "user_input": { "q": "(bitcoin OR cryptocurrency) AND (investment OR trading)", "lang": ["en"], "page": 100, "page_size": 100 // ... other input parameter } } ``` 4. Refine your query as needed. If you're getting too many or too few results, adjust your query. Add more specific terms with `AND`, broaden your search with `OR`, or exclude certain topics with `NOT`. ## See also * [Advanced querying techniques](/v3/documentation/guides-and-concepts/advanced-querying) * [How to optimize search with API parameters](/v3/documentation/how-to/optimize-search) # API changes v2 vs v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/migration/api-changes-v2-vs-v3 Explore key changes and prepare for migration. This guide outlines the key differences between NewsCatcher News API v2 and v3. It provides a technical comparison to help you understand the changes and prepare for migration. This guide only covers the changes for v2/v3 shared endpoints. To learn about other v3 endpoints, their parameters, and response fields, see the [API Reference](/v3/api-reference/endpoints/authors/search-articles-by-author-get). ## Base API changes ### Infrastructure updates | Feature | v2 | v3 | | ------------------------ | ------------------------- | ----------------------------- | | Base URL | api.newscatcherapi.com/v2 | v3-api.newscatcherapi.com/api | | Authentication header | x-api-key | x-api-token | | Maximum articles/request | 100 | 1,000 | | Historical data | Since 2019 | Since 2019\* | All historical data from News API v2 is available in v3. For data collected before July 2023, only the core functionalities are included. Advanced features such as NLP analysis, clustering, and deduplication for this specific time frame can be added upon request. ### Available endpoints | v2 | v3 | Change | | ------------------- | -------------------- | -------------------------------------------------------------------------------------------- | | `/search` | `/search` | Enhanced with additional filtering capabilities, NLP features, clustering, and deduplication | | `/latest_headlines` | `/latest_headlines` | Enhanced with additional filtering capabilities, NLP features, clustering, and deduplication | | `/sources` | `/sources` | Enhanced with additional filtering capabilities | | | `/authors` | Search by author name | | | `/search_by_link` | Search by URL or article ID | | | `/search_similar` | Find similar articles | | | `/aggregation_count` | Get aggregation count by interval | | | `/subscription` | View subscription info | ### Method support * Both v2 and v3 support `GET` and `POST` methods for all endpoints * Multiple value parameter formats: * v2: Most parameters use comma-separated strings, with some exceptions (e.g., `search_in` uses underscore-separated strings) regardless of the method * v3: More consistent formatting: * `GET`: Supports a comma-separated string * `POST`: Supports both a comma-separated string and an array of strings * Single-value parameters maintain their respective formats in both versions ## Parameter changes ### Renaming * `from` → `from_` * `to` → `to_` * `topic` → `theme` The `topic` parameter is removed from the `/sources` endpoint. Instead, you have new filtering capabilities. To learn more, see [Retrieve sources](/v3/api-reference/endpoints/sources/retrieve-sources-get). ### Query parameter (`q`) | Aspect | v2 | v3 | Change | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------- | ------------------ | | Required | Yes | Yes | No change | | Query operators | ✓ Exact match with quotes `"keyword"`
✓ Boolean: `AND`, `OR`, `NOT`
✓ Wildcards: `*` and `?`
✓ Must/Must not: `+` and `-`
✓ Grouping with `()` | Same as v2 plus:
✓ `NEAR` operator
✓ `COUNT` operator | Enhanced operators | | Default behavior | Space-separated tokens treated as `AND` | Same as v2 | No change | ### Search fields (`search_in`) | Aspect | v2 | v3 | Change | | ------------------------------ | --------------------------------------------------- | -------------------------------------------------------------------------------------------------- | --------------------------------- | | Field name for article title | `title` | `title` | No change | | Field name for article content | `summary` | `content` | Renamed to reflect actual content | | Default value | `"title_summary"` | `"title,content"` | Functionally equivalent | | LLM-generated summary | Not available | `summary` (requires NLP) | New feature | | Multiple values | Underscore-separated string, e.g. `"title_summary"` | `GET`: Comma-separated string, e.g. `"title,content"`
`POST`: Comma-separated string or array | Format standardization | ### Content classification (`topic` -> `theme`) | Aspect | v2 | v3 | Change | | --------------------- | ------------------------------------------- | --------------------------------------------------------------------------------------------------- | ---------------------- | | Parameter name | `topic` | `theme` | Renamed | | Case format | lowercase, e.g. `"tech"` | Capitalized, e.g. `"Tech"` | Updated format | | Available categories | 15 lowercase categories | 17 capitalized categories | Expanded | | New categories in v3 | - | `"Health"`, `"Crime"`, `"Financial Crime"`, `"Lifestyle"`, `"Automotive"`, `"Weather"`, `"General"` | Added | | Removed v2 categories | `"beauty"`, `"music"`, `"food"`, `"gaming"` | Consolidated into new categories | Category restructuring | | Multiple values | Comma-separated string | `GET`: Comma-separated string
`POST`: Comma-separated string or array | Enhanced `POST` format | | Exclusion option | Not available | `not_theme` parameter | New feature | | NLP dependency | No | Yes | New requirement | ## New v3 parameters Parameters are grouped by their availability in different subscription plans. For detailed plan information, see [Subscription plans](/v3/documentation/get-started/news-api-v3-subscription-plans). ### Core features Available in all v3 plans, including `v3_basic`. #### Content classification | Parameter | Type | Description | | ----------------- | --------- | ----------------------------------------------------------------------------------- | | `is_headline` | `boolean` | Filters for articles that were posted on the home page of a given news domain | | `is_opinion` | `boolean` | Filters for opinion pieces when true, or excludes opinion-based articles when false | | `is_paid_content` | `boolean` | Filters out articles with paywalled content when false | | `word_count_min` | `integer` | Filters articles based on minimum word count | | `word_count_max` | `integer` | Filters articles based on maximum word count | #### URLs | Parameter | Type | Description | | ------------------ | -------- | --------------------------------------------------------------- | | `parent_url` | `string` | Filters articles by categorical URLs (e.g., "wsj.com/politics") | | `all_links` | `string` | Filters articles by mentioned URLs within their content | | `all_domain_links` | `string` | Filters articles by mentioned domain names within their content | #### Author | Parameter | Type | Description | | ----------------- | -------- | -------------------------------------- | | `not_author_name` | `string` | Excludes articles by specified authors | #### Time-related | Parameter | Type | Description | | --------------- | --------- | -------------------------------------------------------------- | | `by_parse_date` | `boolean` | Uses parse dates instead of published dates for date filtering | #### Source | Parameter | Type | Description | | ------------------------ | --------- | ------------------------------------------------------------------ | | `predefined_sources` | `string` | Filters by predefined top sources per country (e.g., "top 100 US") | | `additional_domain_info` | `boolean` | Includes extra metadata about the source domain | | `is_news_domain` | `boolean` | Filters for news domain sources only | | `news_domain_type` | `string` | Filters by domain type (Original Content, Aggregator, etc.) | | `news_type` | `string` | Filters by news type categories | ### Advanced features Available in specific subscription plans. #### Natural language processing Requires `v3_nlp` plan or higher. | Parameter | Type | Description | | ----------------------- | --------- | ----------------------------------------------------------------------- | | `include_nlp_data` | `boolean` | Includes an NLP object for each article in the response | | `has_nlp` | `boolean` | Filters for articles that have NLP analysis available | | `theme` | `string` | Replaces `topic` parameter with expanded categories and NLP integration | | `not_theme` | `string` | Excludes articles with specified themes | | `ORG_entity_name` | `string` | Filters articles mentioning specific organization names | | `PER_entity_name` | `string` | Filters articles mentioning specific person names | | `LOC_entity_name` | `string` | Filters articles mentioning specific location names | | `MISC_entity_name` | `string` | Filters articles mentioning other named entities | | `title_sentiment_min` | `float` | Filters articles by minimum title sentiment score (-1 to 1) | | `title_sentiment_max` | `float` | Filters articles by maximum title sentiment score (-1 to 1) | | `content_sentiment_min` | `float` | Filters articles by minimum content sentiment score (-1 to 1) | | `content_sentiment_max` | `float` | Filters articles by maximum content sentiment score (-1 to 1) | #### Clustering and deduplication Requires `v3_nlp` plan or higher. | Parameter | Type | Description | | ---------------------- | --------- | -------------------------------------------------------------------------------------------- | | `clustering_enabled` | `boolean` | Enables grouping of similar articles into clusters | | `clustering_variable` | `string` | Specifies which part of the article to use for clustering ("content", "title", or "summary") | | `clustering_threshold` | `float` | Sets similarity threshold for clustering (range: 0-1) | | `exclude_duplicates` | `boolean` | Removes duplicate and highly similar articles from results | #### Tagging Requires the `v3_nlp_iptc_tags` subscription plan. | Parameter | Type | Description | | --------------- | -------- | ------------------------------------------------------ | | `iptc_tags` | `string` | Filters articles by IPTC media topic tags | | `not_iptc_tags` | `string` | Excludes articles with specific IPTC media topic tags | | `iab_tags` | `string` | Filters articles by IAB content categories | | `not_iab_tags` | `string` | Excludes articles with specific IAB content categories | #### Custom tags Custom tags are available in all the NLP plans as a custom solution that provides tailored content classification using your organization's taxonomy. For implementation details and examples, see [Custom tags](/v3/documentation/guides-and-concepts/custom-tags). ## Response changes v2 vs v3 ### Field renaming The following fields have been renamed in v3 for better clarity and consistency: | v2 | v3 | Type | Description | | ----------- | ------------- | -------- | -------------------------------- | | `clean_url` | `domain_url` | `string` | Base domain of the source | | `excerpt` | `description` | `string` | Brief article description | | `summary` | `content` | `string` | Full article content | | `_score` | `score` | `number` | Relevancy score | | `_id` | `id` | `string` | Unique article identifier | | `topic` | `theme` | `string` | Available in v3 with NLP enabled | ### New fields in v3 #### Article object The following fields are available in all v3 plans: | Field | Type | Description | | ------------------------ | --------- | ------------------------------ | | `full_domain_url` | `string` | Complete domain with subdomain | | `name_source` | `string` | Publisher name | | `is_headline` | `boolean` | Homepage article indicator | | `paid_content` | `boolean` | Paywall indicator | | `parent_url` | `string` | Category/section URL | | `journalists` | `array` | Array of journalist names | | `word_count` | `integer` | Article length | | `updated_date` | `string` | Last update timestamp | | `updated_date_precision` | `string` | Update time precision | | `all_links` | `array` | URLs mentioned in article | | `all_domain_links` | `array` | Domains mentioned in article | #### Natural language processing (NLP) object NLP object is a part of article object and vailable for all NLP plans (`v3_nlp` plan or higher) when `include_nlp_data=true`. | Field | Type | Description | | ----- | -------- | --------------------------------------------------------------------- | | `nlp` | `object` | Natural Language Processing analysis results for the article content. | **Article understanding** | Field | Type | Description | | ------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `nlp.summary` | `string` | AI-generated concise summary of article content | | `nlp.theme` | `array[string]` | High-level thematic categories from fixed set: Business, Economics, Entertainment, Finance, Health, Politics, Science, Sports, Tech, Crime, Financial Crime, Lifestyle, Automotive, Travel, Weather, General | **Sentiment analysis** | Field | Type | Description | | ----------------------- | -------- | ------------------------------------------------------------------------------------------------- | | `nlp.sentiment.title` | `number` | Sentiment score for article title (range: -1 to 1, negative values indicate negative sentiment) | | `nlp.sentiment.content` | `number` | Sentiment score for article content (range: -1 to 1, negative values indicate negative sentiment) | **Named entity recognition (NER)** | Field | Type | Description | | -------------- | --------------- | ----------------------------------------------------------------- | | `nlp.ner_PER` | `array[object]` | Named entities recognized as persons | | `nlp.ner_ORG` | `array[object]` | Named entities recognized as organizations | | `nlp.ner_LOC` | `array[object]` | Named entities recognized as locations | | `nlp.ner_MISC` | `array[object]` | Named entities recognized as other types (events, products, etc.) | Each NER object contains: ```json { "entity_name": "string", // Recognized entity name "count": "integer" // Number of mentions in the article } ``` **Tags** Available for the `v3_nlp_iptc_tags` subscription plan. | Field | Type | Description | | -------------------- | --------------- | ------------------------------------------------------------ | | `nlp.iab_tags_name` | `array[string]` | Interactive Advertising Bureau content categorization | | `nlp.iptc_tags_name` | `array[string]` | International Press Telecommunications Council subject names | | `nlp.iptc_tags_id` | `array[string]` | International Press Telecommunications Council subject IDs | **Vector representation** Available for the `v3_nlp_embeddings` plan. | Field | Type | Description | | ------------------- | --------------- | ---------------------------------------------------------------------------------------------------- | | `nlp.new_embedding` | `array[number]` | 1024-dimensional vector embedding for semantic similarity comparison (v3\_nlp\_embeddings plan only) | #### Clustering data Available for all NLP plans when `clustering_enabled=true`: | Field | Type | Description | | ---------------- | --------- | ---------------------------------------- | | `clusters_count` | `integer` | Total number of clusters in the response | | `clusters` | `array` | Array of cluster objects | | `cluster_id` | `string` | Unique identifier for each cluster | | `cluster_size` | `integer` | Number of articles in the cluster | | `articles` | `array` | Array of article objects in the cluster | #### Deduplication data Available for all NLP plans when `exclude_duplicates=true`: | Field | Type | Description | | ----------------------------- | --------- | ----------------------------------------- | | `duplicate_count` | `integer` | Number of duplicate articles found | | `duplicate_articles_group_id` | `string` | Unique identifier for the duplicate group | #### Source object Enhanced source information available in all v3 plans: | Field | Type | Description | | ----------------- | -------- | -------------------- | | `name_source` | `string` | Publisher name | | `domain_url` | `string` | Base domain URL | | `logo` | `string` | Source logo URL | | `additional_info` | `object` | Extended source data | Additional info object fields: | Field | Type | Description | | -------------------- | --------- | ------------------------------------ | | `nb_articles_for_7d` | `integer` | Articles published in last week | | `country` | `string` | Source country code | | `rank` | `integer` | SEO rank | | `is_news_domain` | `boolean` | Indicates if domain is a news source | | `news_domain_type` | `string` | Type of news domain | | `news_type` | `string` | Category of news content | ### Removed response fields in v3 * `topic`: Replaced by `theme` in NLP features for the `/search` and `/latest_headlines` endpoints. The field is unavailable for the `/sources` endpoint as the corresponding parameter has been removed. * `is_republisher`: Replaced by more detailed domain classification. ### Error response changes #### Format ```json v2 format { "status": "error", "error_code": "", "message": "" } ``` ```json v3 format { "message": "", "status_code": 401, "status": "" } ``` #### Status codes | Code | v2 Description | v3 Description | | ---- | -------------------- | -------------------------- | | 400 | API not in headers | Bad request - Invalid JSON | | 401 | API Key not found | Unauthorized | | 403 | Not present | Plan limits exceeded | | 406 | Wrong parameter | Not present | | 408 | Request Timeout | Request Timeout | | 422 | Not present | Validation Error | | 429 | Concurrency violated | Rate limit exceeded | | 500 | Not present | Internal server error | ## SDKs * v2: Python SDK only * v3: SDKs for: * Python * TypeScript * Go * Java * C# All v3 SDKs provide complete support for both core and advanced features. For implementation details, see the [Libraries](/v3/documentation/get-started/libraries) documentation. ## Timeline and support ### Migration timeline * v2 supported until Q1 2025. * Historical data: * v3 data available since July 2023. * v2 historical data migration to v3 planned for Q1 2025. * Initial historical data will include core features only. * Advanced features (NLP, clustering, etc.) available for historical data upon request. ### Support during migration * Both versions are available for parallel testing. * Automatic migration to `v3_basic` plan for existing v2 customers. * All new v3 endpoints accessible in `v3_basic` plan. * Advanced features require specific plans: * NLP features: `v3_nlp` plan. * IPTC and IAB tags: `v3_nlp_iptc_tags` plan. * Embeddings: `v3_nlp_embeddings` plan. * Custom tags: Available as a custom solution in all v3 NLP plans. ## Next steps 1. Review the [Migration guide](/v3/documentation/migration/migration-guide) for implementation details. 2. Explore plan features and requirements in [Subscription plans](v3/documentation/get-started/news-api-v3-subscription-plans). 3. Check the version-specific API Reference for detailed parameter and response field documentation: * [v3 API Reference](/v3/api-reference) * [v2 API Reference](/v2/api-reference) 4. Test v3 endpoints alongside your v2 implementation. 5. For advanced features: * Learn about [NLP features](/v3/documentation/guides-and-concepts/nlp-features) * Explore [Custom tags](/v3/documentation/guides-and-concepts/custom-tags) * Understand [Clustering](/v3/documentation/guides-and-concepts/clustering-news-articles) capabilities For implementation support or custom solutions, contact our support team. # Migration guide v2 to v3 Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/migration/migration-guide Step-by-step guide for migrating from News API v2 to v3 This guide provides practical instructions for migrating from News API v2 to v3, with Python code examples for each endpoint. ## Prerequisites Before starting your migration: 1. Obtain your v3 API token. 2. Review the [API changes v2 vs v3](v3/documentation/migration/api-changes-v2-vs-v3). 3. Have Python with the `requests` library installed. ## Basic setup To get started, change authentication and base URLs: ```python v2 import requests from typing import Dict, Optional API_KEY: str = "YOUR_API_KEY" BASE_URL: str = "https://api.newscatcherapi.com/v2" HEADERS: Dict[str, str] = {"x-api-key": API_KEY} def search_news(query: str) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/search", headers=HEADERS, params={"q": query} ) response.raise_for_status() return response.json() except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ```python v3 import requests from typing import Dict, Optional API_TOKEN: str = "YOUR_API_TOKEN" BASE_URL: str = "https://v3-api.newscatcherapi.com/api" HEADERS: Dict[str, str] = {"x-api-token": API_TOKEN} def search_news(query: str) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/search", headers=HEADERS, params={"q": query} ) response.raise_for_status() return response.json() except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ## Search endpoint migration The search endpoint enables news search with enhanced filtering capabilities in v3. Key changes include parameter renaming, updated response fields, and new filtering options. ### Parameter changes Replace `from` and `to` with `from_` and `to_` respectively: ```python v2 params = { "q": "Tesla", "from": "2024/01/01", "to": "2024/01/31" } ``` ```python v3 params = { "q": "Tesla", "from_": "2024/01/01", "to_": "2024/01/31" } ``` Replace `topic` with `theme` and enable NLP: ```python v2 params = { "q": "Tesla", "topic": "tech" } ``` ```python v3 params = { "q": "Tesla", "theme": "Tech", "include_nlp_data": True } ``` Replace `search_in` format: ```python v2 params = { "q": "Tesla", "search_in": "title_summary" } ``` ```python v3 params = { "q": "Tesla", "search_in": "title,content" } ``` ### Complete search example ```python v2 import requests from typing import Dict, Optional API_KEY: str = "YOUR_API_KEY" BASE_URL: str = "https://api.newscatcherapi.com/v2" HEADERS: Dict[str, str] = {"x-api-key": API_KEY} def search_articles( query: str, from_date: str, to_date: str, topic: str ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/search", headers=HEADERS, params={ "q": query, "from": from_date, "to": to_date, "topic": topic, "search_in": "title_summary", "lang": "en" } ) response.raise_for_status() data = response.json() articles = [] for article in data.get("articles", []): articles.append({ "title": article.get("title"), "url": article.get("link"), "source": article.get("clean_url"), "published_date": article.get("published_date"), "content": article.get("summary") }) return { "total_articles": data.get("total_hits", 0), "articles": articles } except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ```python v3 import requests from typing import Dict, Optional API_TOKEN: str = "YOUR_API_TOKEN" BASE_URL: str = "https://v3-api.newscatcherapi.com/api" HEADERS: Dict[str, str] = {"x-api-token": API_TOKEN} def search_articles( query: str, from_date: str, to_date: str, theme: str ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/search", headers=HEADERS, params={ "q": query, "from_": from_date, "to_": to_date, "theme": theme, "search_in": "title,content", "include_nlp_data": True, "lang": "en" } ) response.raise_for_status() data = response.json() articles = [] for article in data.get("articles", []): articles.append({ "title": article.get("title"), "url": article.get("link"), "source": article.get("domain_url"), "published_date": article.get("published_date"), "content": article.get("content"), "nlp_summary": article.get("nlp", {}).get("summary") }) return { "total_articles": data.get("total_hits", 0), "articles": articles } except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ### Usage example ```python v2 # Search for Tesla news in tech result = search_articles( query="Tesla", from_date="2024/01/01", to_date="2024/01/31", topic="tech" ) ``` ```python v3 # Search for Tesla news in tech result = search_articles( query="Tesla", from_date="2024/01/01", to_date="2024/01/31", theme="Tech" ) ``` ### Response structure changes ```json v2 { "status": "ok", "total_hits": 521, "page": 1, "total_pages": 6, "page_size": 100, "articles": [ { "title": "Tesla lowers range estimates for Model X, S, Y cars", "author": "Matt Binder", "published_date": "2024-01-05 18:11:03", "link": "https://mashable.com/article/tesla-range-estimates", "clean_url": "mashable.com", // Domain name, renamed to domain_url in v3 "excerpt": "The change follows...", // Brief summary, renamed to description in v3 "summary": "Tesla has updated...", // Full text, renamed to content in v3 "topic": "tech", // Topic classification, moved to nlp.theme in v3 "country": "US", "language": "en", "authors": "Matt Binder", "is_opinion": false, "twitter_account": "@mashable", "_score": 11.959808, // Renamed to score in v3 "_id": "e623cee7239059b40ca40234" // Renamed to id in v3 } ], "user_input": { "q": "Tesla", "search_in": ["title_summary_en"], "lang": ["en"], "from": "2024-01-01 00:00:00", "to": "2024-01-31 00:00:00", "ranked_only": "True", "sort_by": "relevancy", "page": 1, "size": 100, "not_sources": [], "topic": "tech" // Renamed to theme in v3 } } ``` ```json v3 { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100, "articles": [ { "title": "Tesla finally moves forward with Gigafactory Nevada expansion", "author": "Fred Lambert", "published_date": "2024-01-19 20:54:24", "link": "https://electrek.co/2024/01/19/tesla-expansion", "domain_url": "electrek.co", // Previously clean_url "full_domain_url": "electrek.co", // New in v3 "description": "Tesla finally moved...", // Previously excerpt "content": "Tesla finally moved...", // Previously summary "is_headline": false, // New in v3 "paid_content": false, // New in v3 "word_count": 448, // New in v3 "all_links": ["https://...", "https://..."], // New in v3 "all_domain_links": ["site.com"], // New in v3 "country": "US", "language": "en", "authors": ["Fred Lambert", "Michelle Lewis"], "is_opinion": false, "twitter_account": "@electrekco", "nlp": { // New in v3 with include_nlp_data=true "theme": "Automotive, Business, Tech", // Previously standalone topic field "summary": "Gigafactory Nevada...", "sentiment": { "title": 0.99, "content": 0.0 }, "ner_PER": [], "ner_ORG": [ { "entity_name": "Tesla", "count": 17 } ] }, "score": 11.94476, // Previously _score "id": "f999ed96a0d460bf" // Previously _id } ], "user_input": { "q": "Tesla", "search_in": ["title", "content"], "lang": ["en"], "from_": "2024/01/01", // Previously from "to_": "2024/01/31", // Previously to "sort_by": "relevancy", "page": 1, "page_size": 100, // Previously size "theme": [["Tech"]], // Previously topic "include_nlp_data": true // New in v3 } } ``` ## Latest headlines migration The latest headlines endpoint provides access to recent news articles. Migration involves similar parameter updates as the search endpoint, with additional time-based filtering options. ### Parameter changes Replace `topic` with `theme` and enable NLP: ```python v2 params = { "topic": "business", "countries": "US,GB" } ``` ```python v3 params = { "theme": "Business", "countries": "US,GB", "include_nlp_data": True } ``` Optionally specify time range with the `when` parameter: ```python v2 params = { "countries": "US,GB", "when": "24h" # Optional, defaults to "7d" } ``` ```python v3 params = { "countries": "US,GB", "when": "24h" # Optional, defaults to "7d" } ``` ### Complete latest headlines example ```python v2 import requests from typing import Dict, Optional API_KEY: str = "YOUR_API_KEY" BASE_URL: str = "https://api.newscatcherapi.com/v2" HEADERS: Dict[str, str] = {"x-api-key": API_KEY} def get_latest_headlines( countries: str, topic: str ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/latest_headlines", headers=HEADERS, params={ "countries": countries, "topic": topic, "lang": "en" } ) response.raise_for_status() data = response.json() articles = [] for article in data.get("articles", []): articles.append({ "title": article.get("title"), "url": article.get("link"), "source": article.get("clean_url"), "published_date": article.get("published_date"), "content": article.get("summary") }) return { "total_articles": data.get("total_hits", 0), "articles": articles } except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ```python v3 import requests from typing import Dict, Optional API_TOKEN: str = "YOUR_API_TOKEN" BASE_URL: str = "https://v3-api.newscatcherapi.com/api" HEADERS: Dict[str, str] = {"x-api-token": API_TOKEN} def get_latest_headlines( countries: str, theme: str ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/latest_headlines", headers=HEADERS, params={ "countries": countries, "theme": theme, "include_nlp_data": True, "lang": "en" } ) response.raise_for_status() data = response.json() articles = [] for article in data.get("articles", []): articles.append({ "title": article.get("title"), "url": article.get("link"), "source": article.get("domain_url"), "published_date": article.get("published_date"), "content": article.get("content"), "nlp_summary": article.get("nlp", {}).get("summary") }) return { "total_articles": data.get("total_hits", 0), "articles": articles } except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ### Usage example ```python v2 # Get latest business headlines from US and GB result = get_latest_headlines( countries="US,GB", topic="business" ) ``` ```python v3 # Get latest business headlines from US and GB result = get_latest_headlines( countries="US,GB", theme="Business" ) ``` ### Additional filtering options V3 provides enhanced filtering capabilities for latest headlines: ```python params = { "theme": "Business", "countries": "US,GB", "include_nlp_data": True "is_headline": True, # Filter for homepage articles "is_paid_content": False, # Exclude paywalled content "word_count_min": 200, # Minimum article length "word_count_max": 1000 # Maximum article length } ``` For a complete list of `/latest_headlines` parameters, see the [Latest headlines](/v3/api-reference/endpoints/latestheadlines/retrieve-latest-headlines-post) reference documentation. ### Response structure changes ```json v2 { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 200, "page_size": 50, "articles": [ { "title": "Donald Trump Nominates Fox Business Host Sean Duffy", "author": "Ted Johnson", "published_date": "2024-11-18 23:07:23", "published_date_precision": "full", "link": "https://deadline.com/2024/11/trump-sean-duffy-1236180738", "clean_url": "deadline.com", // Domain name, renamed to domain_url in v3 "excerpt": "Donald Trump has gone...", // Brief text, renamed to description in v3 "summary": "Sean Duffy in 2018...", // Full text, renamed to content in v3 "topic": "business", // Moved to nlp.theme in v3 "country": "US", "language": "en", "authors": "Where Img,Class,Display Inline,Ted Johnson", "media": "https://deadline.com/wp-content/uploads/2024/11/img.jpg", "is_opinion": false, "twitter_account": "@tedstew", "_score": null, // Renamed to score in v3 "_id": "75382d1ff5336599bce837ab168bb34b" // Renamed to id in v3 } ], "user_input": { "lang": ["en"], "countries": ["US", "GB"], "topic": "business", // Renamed to theme in v3 "from": "2024-11-11 23:14:24" } } ``` ```json v3 { "status": "ok", "total_hits": 10000, "page": 1, "total_pages": 100, "page_size": 100, "articles": [ { "title": "The PIPEs Conference 2024", "author": "Hollywood", "published_date": "2024-11-14 00:00:00", "published_date_precision": "timezone unknown", "updated_date": null, // New in v3 "updated_date_precision": null, // New in v3 "link": "https://www.wilmerhale.com/en/insights/events/20241114", "domain_url": "wilmerhale.com", // Previously clean_url "full_domain_url": "wilmerhale.com", // New in v3 "name_source": "WilmerHale", // New in v3 "is_headline": false, // New in v3 "paid_content": false, // New in v3 "description": "On Thursday, November 14...", // Previously excerpt "content": "On Thursday, November 14...", // Previously summary "word_count": 120, // New in v3 "country": "US", "language": "en", "authors": ["Hollywood"], "media": "https://www.wilmerhale.com/-/media/img.jpg", "is_opinion": false, "twitter_account": "@WilmerHale", "all_links": [], // New in v3 "all_domain_links": [], // New in v3 "nlp": { // New with include_nlp_data=true "theme": "Business", // Previously standalone topic field "summary": "Caroline Dotolo and...", "sentiment": { "title": 0.0, "content": 0.8297 }, "ner_PER": [ { "entity_name": "Caroline Dotolo", "count": 2 } ], "ner_ORG": [ { "entity_name": "SEC", "count": 2 } ] }, "score": 0.0, // Previously _score "id": "a04114fd98e66a95f6bcba13cd5cd424" // Previously _id } ], "user_input": { "lang": ["en"], "countries": ["US", "GB"], "theme": [["Business"]], // Previously topic "include_nlp_data": true, // New in v3 "when": "2024-11-11T23:11:33.960162" } } ``` ## Sources endpoint migration The sources endpoint in v3 provides enhanced metadata about news sources. Key changes include the removal of the `topic` parameter and introduction of new filtering capabilities. ### Parameter changes The `topic` parameter is removed in v3. Instead, use new filtering options: ```python v2 params = { "lang": "en", "countries": "US", "topic": "news" } ``` ```python v3 params = { "lang": "en", "countries": "US", "news_domain_type": "Original Content", "news_type": "News and Blogs" } ``` Use `include_additional_info` to get enhanced source information: ```python v2 params = { "lang": "en", "countries": "US" } ``` ```python v3 params = { "lang": "en", "countries": "US", "include_additional_info": True } ``` ### Complete sources example ```python v2 import requests from typing import Dict, Optional API_KEY: str = "YOUR_API_KEY" BASE_URL: str = "https://api.newscatcherapi.com/v2" HEADERS: Dict[str, str] = {"x-api-key": API_KEY} def get_sources( countries: str, lang: str = "en" ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/sources", headers=HEADERS, params={ "countries": countries, "lang": lang } ) response.raise_for_status() data = response.json() return { "message": data.get("message"), "sources": data.get("sources", []) # List of domain strings } except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ```python v3 import requests from typing import Dict, Optional API_TOKEN: str = "YOUR_API_TOKEN" BASE_URL: str = "https://v3-api.newscatcherapi.com/api" HEADERS: Dict[str, str] = {"x-api-token": API_TOKEN} def get_sources( countries: str, lang: str = "en", include_additional_info: bool = True ) -> Optional[Dict]: try: response = requests.get( f"{BASE_URL}/sources", headers=HEADERS, params={ "countries": countries, "lang": lang, "include_additional_info": include_additional_info, }, ) response.raise_for_status() data = response.json() sources = [] for source in data.get("sources", []): source_info = { "name": source.get("name_source"), "domain": source.get("domain_url"), "logo": source.get("logo"), } if source.get("additional_info"): source_info.update( { "articles_count_7d": source["additional_info"].get( "nb_articles_for_7d" ), "country": source["additional_info"].get("country"), "rank": source["additional_info"].get("rank"), "is_news_domain": source["additional_info"].get( "is_news_domain" ), "domain_type": source["additional_info"].get( "news_domain_type" ), "news_type": source["additional_info"].get("news_type"), } ) sources.append(source_info) return {"message": data.get("message"), "sources": sources} except requests.exceptions.RequestException as e: print(f"Error making request: {e}") return None ``` ### Usage example ```python v2 # Get US news sources result = get_sources(countries="US") # Access source domains (returns list of strings) sources = result["sources"] # e.g., ["example.com", "news.com"] ``` ```python v3 # Get US news sources with metadata result = get_sources(countries="US") # Access detailed source information for source in result["sources"]: print( f""" Source: {source['name']} Domain: {source['domain']} Articles (7 days): {source.get('articles_count_7d')} Rank: {source.get('rank')} Domain Type: {source.get('domain_type')} News Type: {source.get('news_type')} """ ) ``` ### Advanced filtering options V3 provides additional parameters for precise source filtering: ```python params = { "lang": "en", "countries": "US", "include_additional_info": True, "source_name": "tech,news", # Search within source names "is_news_domain": True, # Filter for news domains only "news_domain_type": "Original Content", # Can be "Original Content" or "Aggregator" "from_rank": 1, # Filter by rank range "to_rank": 1000 } ``` ### Response structure changes ```json v2 { "message": "Maximum sources displayed according to your plan is set to 1000", "sources": [ // Simple array of domain strings "wn.com", "yahoo.com", "headtopics.com" // ... ], "user_input": { "lang": ["en"], "countries": ["US"], "topic": "news" // Replaced by more specific classification in v3 } } ``` ```json v3 { "message": "Maximum sources displayed according to your plan is set to 1000", "sources": [ // Array of detailed source objects { "name_source": "Dakota Financial News", // New in v3 "domain_url": "dakotafinancialnews.com", // Previously just domain string "logo": null, // New in v3 "additional_info": { // New in v3 with include_additional_info "nb_articles_for_7d": 7564, "country": "US", "rank": 873414, "is_news_domain": false, "news_domain_type": "Original Content", "news_type": "News and Blogs" } } // ... ], "user_input": { "lang": ["en"], "countries": ["US"], "include_additional_info": true, // New in v3 "news_domain_type": ["Original Content"], // New in v3, replaces topic "news_type": ["News and Blogs"] // New in v3, replaces topic } } ``` ## Next steps 1. Test your migrated implementation. 2. Review [How-to](/v3/documentation/how-to) documentation for v3 usage. 3. Explore [News API v3 endpoints](/v3/api-reference/endpoints/authors/search-articles-by-author-get) for additional capabilities. # Interpret and handle API errors Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/troubleshooting/error-handling Understand and resolve common API errors This guide explains how to interpret and resolve common errors you may encounter while using the NewsCatcher News API v3. Understanding these errors can help you troubleshoot issues effectively and improve your integration with our API. ## Error response structure When an error occurs, the API returns a standard error response with the following structure: * `message`: A detailed description of the error. * `status_code`: The HTTP status code of the error. * `status`: A short description of the status code. **Example:** ```json { "message": "Invalid language", "status_code": 422, "status": "Validation error" } ``` ## Common errors and solutions Below are some common errors you might encounter and how to resolve them. ### 401 Unauthorized Authentication failed, usually due to an invalid or missing API key. **Example:** ```json { "message": "Invalid api key: INVALID_API_KEY", "status_code": 401, "status": "Unauthorized" } ``` **Solution:** 1. Verify that you are using the correct API key. 2. Ensure you include the API key in the `x-api-token` header of your request. 3. Check the status of your API key by requesting the `/subscription` endpoint. ### 403 Forbidden The request is valid, but the server refuses action. This may occur if you lack the necessary permissions for a resource, use parameters that do not exist or are not allowed for your plan, or if the specified date range exceeds your plan's allowed history period. **Example:** ```json { "message": "Your plan request date range cannot be greater than 400 days", "status_code": 403, "status": "Forbidden" } ``` **Solution:** 1. Ensure your account has the necessary permissions for the requested operation. 2. Review your parameters and verify they are allowed under your subscription plan. 3. Check for any typos in the parameter names. 4. Ensure the date range specified in your request does not exceed the allowed history period for your plan. 5. If you believe you should have access, [contact support](https://support-sign-in.newscatcherapi.com/) to review your account permissions. ### 408 Request timeout The server did not receive a complete request message within the default timeout of 30 seconds. This could be due to slow network connections, high server load, or client-side delays. **Example:** ```json { "message": "Request timed out after 30 seconds", "status_code": 408, "status": "Request timeout" } ``` **Solution:** 1. Ensure your network connection is stable and fast enough to complete the request in a timely manner. 2. If possible, reduce the size of your request payload to minimize the time needed to send the request to the server. 3. Narrow your search query by avoiding the `*` wildcard in the `q` parameter and using filters like `from_`, `to_`, or `sources`. 4. Implement retry logic with exponential backoff to handle temporary network issues causing timeouts. ### 422 Validation error The server understands the content type of the request but is unable to process the instructions contained in it due to invalid input. **Example:** ```json { "message": "Invalid date format", "status_code": 422, "status": "Validation error" } ``` **Solution:** 1. Check the format and values of your request payload. 2. Ensure that all required fields are present and correctly formatted. 3. Follow the specific validation rules, such as ensuring the `from_` date is not greater than the `to_` date, and check for correct parameter formats as described in the documentation. ### 429 Too many requests You have exceeded the rate limit for API requests. **Example:** ```json { "message": "Max API requests concurrency reached", "status_code": 429, "status": "Too many requests" } ``` **Solution:** 1. Implement request throttling in your application to stay within rate limits. 2. Use exponential backoff strategies when retrying requests. 3. If you consistently hit rate limits, consider upgrading your plan to a higher limit. ### 499 Unknown status code A non-standard HTTP status code used for various client-side errors that do not fit into standard HTTP status codes. **Example:** ```json { "message": "str field required", "status_code": 499, "status": "Unknown status code" } ``` **Solution:** 1. Check your request payload for missing required fields. 2. Ensure all parameters are correctly formatted and of the expected type. 3. Review the [API documentation](/v3/api-reference) for the correct format. ### 500 Internal server error The server encountered an unexpected condition that prevented it from fulfilling the request. This is typically a server-side issue but could also result from a malformed or broken payload on the client side. **Example:** This error often does not return a JSON response; you might see a generic error page or a connection error. **Solution:** 1. Wait a few minutes and try your request again. 2. Check the [NewsCatcher status page](https://status.newscatcherapi.com/) for any ongoing issues. 3. Validate your payload before making the API request. 4. If the problem continues and no known issues are reported, [contact support](https://support-sign-in.newscatcherapi.com/) with details of your request. ## Troubleshooting with correlation IDs All API responses include a `correlation-id` in the response headers. When reporting errors or contacting support, always include this ID to help us quickly identify and resolve your issue. ```http correlation-id: 2697cebe-6f5c-46e0-9b99-81e8abe55522 ``` For details on using correlation IDs for effective debugging, see our [Request Tracing guide](/v3/documentation/troubleshooting/request-tracing-correlation-ids). ## Best practices * **Check the status code and error message:** Always inspect the status code and error message in the API response to understand the nature of the error. * **Implement error handling logic:** Use try-catch blocks or equivalent mechanisms in your code to manage errors gracefully and log them for future analysis. * **Use retry mechanisms with backoff:** For transient errors like `429` (Too many requests) or `503` (Service unavailable), implement retries with exponential backoff to avoid overwhelming the server. * **Validate input data:** Ensure your data is correct and adheres to the API's expected formats before making requests to reduce errors. * **Monitor usage and error logs:** Regularly check your API usage and error logs to identify and address recurring issues or patterns. * **Follow security best practices:** Protect your API key, validate user inputs, and monitor for any unauthorized usage to prevent security issues. * **Stay updated:** Periodically check the [API documentation](/v3/api-reference) and [status page](https://status.newscatcherapi.com) for updates or changes. ## Additional resources * [API Reference documentation](/v3/api-reference) * [Rate limits and quotas](/v3/api-reference/overview/rate-limits) * [NewsCatcher status page](https://status.newscatcherapi.com) * [How to report bugs](/v3/documentation/troubleshooting/report-bugs) # Report bugs Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/troubleshooting/report-bugs How to report bugs and issues with the NewsCatcher APIs This guide explains how to effectively report bugs and issues with the NewsCatcher APIs to ensure quick resolution and improve your experience. ## Before reporting Before submitting a bug report, please take these steps to ensure efficient resolution: 1. **Check documentation**: Review the [API Reference](/v3/api-reference) to verify you're using parameters correctly and that you understand expected behaviors. 2. **Verify API usage**: Double-check that you're using the correct headers, parameters, and payload formats according to our documentation. 3. **Check status page**: Visit our [status page](https://status.newscatcherapi.com) to see if there are any known issues or maintenance activities affecting the service. 4. **Try our debugging tools**: Use our [Postman collections](https://www.postman.com/newscatcherapi/newscatcher-public-workspace/overview) or the API playground in our documentation to verify if the issue can be reproduced with different tools. 5. **Examine your code**: Double-check your implementation to ensure the issue isn't in your integration code. Many reported "bugs" turn out to be integration issues that can be resolved through proper configuration or implementation. Testing with Postman often helps identify these cases quickly. ## Where to report bugs You can report bugs through the following official channels: Our preferred channel for bug reports with structured templates and ticket tracking Alternative channel for bug reports and general support For eligible customers with dedicated Slack channel access (based on plan tier) Review the current issues and open a new one if needed (for SDK-related bugs only) Our Customer Portal is the recommended channel as it automatically creates tickets in our internal tracking system, allowing for faster processing and resolution. ## How to report effectively When reporting a bug, include the following information to help us identify and fix the issue quickly: ### 1. Include the correlation ID Every API response includes a unique `correlation-id` in the headers. This ID is essential for us to trace your request through our system. Here's how to find it: ```bash Terminal curl -i https://api.newscatcherapi.com/v3/search # Look for: correlation-id: 2697cebe-6f5c-46e0-9b99-81e8abe55522 ``` ```javascript JavaScript // After making a request console.log(response.headers.get("correlation-id")); ``` ```python Python # After making a request print(response.headers['correlation-id']) ``` ### 2. What happened vs. what you expected Briefly describe: * What you were trying to do * What actually happened * What you expected to happen instead ### 3. Request and response details Provide information to reporoduce the issue: * API endpoint and method (`GET`/`POST`) * Request parameters or payload * Error message and status code ### 4. Bug report template For convenience, you can use our template when reporting bugs: ``` ## Bug Description [Brief description of the issue] ## Correlation ID correlation-id: [ID from response headers] ## Request Details - Endpoint: [e.g., /v3/search] - Method: [GET/POST] - Parameters/Body: [Key details] ## Response - Status Code: [e.g., 500] - Error Message: [If applicable] ## Expected Behavior [What you expected to happen] ## Environment - Language/SDK: [e.g., Python 3.9, newscatcher-python 1.2.0] ``` If possible, attach relevant screenshots that show the issue. ## Common issues and solutions Before reporting a bug, check if your issue matches one of these common scenarios: ### Using incorrect authentication header We use `x-api-token` for authentication, not `x-api-key`: ```http Correct GET /v3/search HTTP/1.1 Host: api.newscatcherapi.com x-api-token: YOUR_API_KEY ``` ```http Incorrect GET /v3/search HTTP/1.1 Host: api.newscatcherapi.com x-api-key: YOUR_API_KEY ``` ### Malformed JSON payload causing 500 errors If you receive a `500 Internal Server Error`, check your request payload: ```json Correct { "q": "bitcoin", "from_": "30d", "countries": ["US", "CA"] } ``` ```json Incorrect { "q": "bitcoin", "from_": "30d", "countries": ["US", "CA" // Missing closing bracket } ``` For security reasons, our API may return a generic 500 error for certain malformed payloads instead of detailed validation errors. Always validate your JSON before sending. ## What happens after you report a bug? 1. **Ticket creation**: Your report is converted into a ticket in our internal tracking system. 2. **Initial assessment**: Our support team performs an initial assessment to determine severity and priority. 3. **Investigation**: Our engineering team uses the correlation ID to trace the issue through our logs and systems. 4. **Status updates**: We'll keep you informed about the progress of your bug report through the same channel you used to report it. 5. **Resolution**: Once resolved, we'll provide you with details about the fix and when you can expect it to be deployed. ## Issue priority and response times When you report a bug, our support team assesses its priority based on urgency and impact according to our [Service Support Policy](https://www.newscatcherapi.com/services-support-policy). Understanding these priority levels can help you set appropriate response and resolution times expectations. ### How we determine priority How quickly the issue needs to be addressed based on its effect on your operations The scope and severity of the issue on the service functionality ### Priority levels | Priority | Description | Target Response Time | Target Resolution Time | | ---------- | -------------------------------------------------------------------------- | -------------------- | ---------------------- | | Priority 1 | Critical issue causing complete service unavailability with no workaround | 1 hour | Within 12 hours | | Priority 2 | Significant issue with major impact; service usable but seriously impaired | 4 hours | 2 business days | | Priority 3 | Moderate issue with acceptable workarounds available | 12 hours | 5 business days | | Priority 4 | Minor issue with minimal impact | 1 business day | Planned | Priority 1 issues should be reported by phone in addition to creating a ticket to ensure immediate attention. Standard support hours are 8am–5pm on business days in your principal location. When reporting a bug, you can suggest a priority level based on the criteria above, but the final determination will be made by our support team based on technical assessment. ## Debugging tools We provide several tools to help you debug issues before reporting them: Pre-configured request collections for all our APIs with examples Interactive API testing tool available directly in our documentation To use our AI chat assistant, type your question in the search bar (⌘K) Real-time service status and incident history ## Related resources * [Error handling](/v3/documentation/troubleshooting/error-handling) * [Request tracing with correlation IDs](/v3/documentation/troubleshooting/request-tracing-correlation-ids) * [API error codes](/v3/api-reference/overview/errors) # How to request features Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/troubleshooting/request-features Learn how to request new features or suggest enhancements. Coming soon ... # Request tracing with correlation IDs Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/documentation/troubleshooting/request-tracing-correlation-ids Learn how to use correlation IDs to debug and troubleshoot API requests In a distributed system like NewsCatcher, tracing requests across multiple services is essential for effective troubleshooting. This guide explains how correlation IDs work and how to use them to resolve issues quickly. ## What is a correlation ID? A correlation ID is a unique identifier assigned to every request that enters our system. This ID follows the request through all microservices and appears in logs across our infrastructure, enabling end-to-end request tracing. Think of a correlation ID as a tracking number for your API request - it lets you and our support team follow the request's journey through our entire system. In the NewsCatcher APIs, correlation IDs appear in: * HTTP response headers as `correlation-id` * Internal logs across our Kubernetes infrastructure * Error tracking systems ## Why correlation IDs matter ### Faster support resolution When reporting issues to our support team, including the correlation ID allows us to immediately locate all relevant logs and trace the exact path your request took through our systems. This dramatically speeds up troubleshooting and resolution times. ### End-to-end request visibility In complex operations that touch multiple services (search, filtering, aggregation, etc.), correlation IDs let us reconstruct the complete journey of your request and identify exactly where problems occurred. ### Performance analysis For requests with unexpected latency, correlation IDs help us analyze and optimize performance by measuring exactly how long each step in processing took. ## How to use correlation IDs Every response from the NewsCatcher API includes a `correlation-id` in the HTTP headers. You can view this: * In your HTTP client's response headers * In programming language responses by accessing the headers collection * In Postman under the **Headers** tab of the response ![Correlation ID in Postman](https://mintlify.s3.us-west-1.amazonaws.com/newscatcherinc-docs/images/correlation-id-postman.png) Example correlation ID: `a702576c-2007-4b23-9ba4-cad305c84275` When contacting our support team about an API issue, always include: * The complete correlation ID from the response headers * The approximate time the request was made * A brief description of the expected vs. actual behavior This information lets us immediately locate the exact request in our logs and provide faster assistance. Without the correlation ID, debugging process would require much more back-and-forth and manual searching through logs. ## Correlation IDs and support SLAs Providing a correlation ID when reporting issues helps us meet our service level agreements (SLAs) for support response times. According to our [Service Support Policy](https://www.newscatcherapi.com/services-support-policy), we have defined target response times based on issue priority: * Priority 1 (Critical): 1 hour * Priority 2 (Significant): 4 hours * Priority 3 (Moderate): 12 hours * Priority 4 (Minor): 1 business day Without a correlation ID, troubleshooting takes significantly longer, which may impact our ability to meet these response targets. For the fastest possible resolution, always include the correlation ID from the relevant API request when contacting support. For Priority 1 issues (complete service unavailability), please contact support by phone at +33625707180 in addition to submitting a ticket with the correlation ID to ensure immediate attention. ## Best practices Always include correlation IDs in your own application logs when making requests to our APIs. Always include correlation IDs when reporting issues to our support team. Integrate correlation ID extraction into your application's error handling logic. Consider storing correlation IDs for important requests temporarily for easier troubleshooting. ## Related resources * [Error handling](/v3/documentation/troubleshooting/error-handling) * [Report bugs](/v3/documentation/troubleshooting/report-bugs) * [API error codes](/v3/api-reference/overview/errors) # Get event fields Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/endpoints/event-fields-get events-api get /api/events_info/get_event_fields Returns available fields for specified event type. # Check API health status Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/endpoints/health-get events-api get /api/health Checks if the API service is operational. # Search for events Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/endpoints/search-events-post events-api post /api/events_search Searches for structured event data extracted from news articles. Supports filtering by event type, date ranges, and event-specific fields. # Retrieve subscription plan information Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/endpoints/subscription-get events-api get /api/subscription Returns information about current subscription including available events and usage limits. # Retrieve subscription plan information Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/endpoints/subscription-post events-api post /api/subscription Returns information about current subscription including available events and usage limits. # Discover data breach events Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/event-types/data-breach Search and analyze data breach events using available fields, parameters, and query filters. Events API provides access to structured information about data breaches extracted from news articles. This guide explains how to discover available search fields, construct search requests, and understand the returned data. ## Available search fields Get available search fields for data breach events using the discovery endpoint: ```bash GET /api/events_info/get_event_fields?event_type=data_breach ``` The endpoint returns the following fields that can be used for filtering search results. All fields are optional for search requests. ### Common fields * `company_name`: The name of the company affected by the data breach. * `event_date`: The date when the breach occurred. * `extraction_date`: The date when the event was extracted from news sources. ### Data breach-specific fields * `data_breach.data`: The types of data that were compromised in the breach, such as customer phone numbers or email addresses. * `data_breach.data_type`: The classification of the leaked data. Possible values: technical, financial, personal, health, credentials. * `data_breach.impacted`: The types of entities affected by the breach, such as clients, customers, or employees. * `data_breach.summary`: A detailed description of the data breach event and its impact. * `data_breach.title`: The title summarizing the data breach event. ## Searching for events Use the search endpoint to find data breach events: ```bash POST /api/events_search ``` ### Basic request structure ```json { "event_type": "data_breach", "attach_articles_data": true, "additional_filters": { // search criteria using available fields } } ``` ### Using search fields Search by dates (absolute or relative): ```json { "event_type": "data_breach", "additional_filters": { "event_date": { "gte": "2024-01-01", "lte": "2024-02-01" }, "extraction_date": { "gte": "now-7d", "lte": "now" } } } ``` Search by data type: ```json { "event_type": "data_breach", "additional_filters": { "data_breach.data_type": "personal", "data_breach.data": "Customer Email Addresses" } } ``` Search by affected entities: ```json { "event_type": "data_breach", "additional_filters": { "data_breach.impacted": "Customers" } } ``` ## Understanding the response The API returns matched events in this structure: ```json { "message": "Success", "count": 1, "events": [ { "id": "event-id", "event_type": "data_breach", "global_event_type": "DataMonitoring", "associated_article_ids": ["article-id-1", "article-id-2"], "extraction_date": "2024-06-17 16:02:28", "event_date": "2022-05-06 00:00:00", "company_name": "Company Name", "data_breach": { "summary": "Detailed description of the breach event and its impact", "impacted": ["Customers", "Employees"], "data": "Types of compromised data (e.g., personal information, financial data)", "data_type": "Classifications like personal, financial, technical", "title": "Brief title describing the breach" } } ] } ``` ## Best practices 1. Use `extraction_date` for new breach alerts and `event_date` for analyzing disclosure patterns. 2. Combine `data_type` and `data` fields to find specific categories of compromised information. 3. Search across wider date ranges as breaches often get disclosed weeks or months after occurrence. 4. Use `impacted` field to focus on breaches affecting specific groups like customers or employees. ## See also * [Parameter formats](/v3/events/overview/parameter-formats) * [Working with articles](/v3/events/overview/working-with-articles) * [API reference](/v3/events/endpoints) # Query fundraising events Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/event-types/fundraising Filter and retrieve company funding events using fields, parameters, and search criteria. Events API provides access to structured information about company funding rounds extracted from news articles. This guide explains how to discover available search fields, construct search requests, and understand the returned data. ## Available search fields Get available search fields for fundraising events using the discovery endpoint: ```bash GET /api/events_info/get_event_fields?event_type=fundraising ``` The endpoint returns the following fields that can be used for filtering search results. All fields are optional for search requests. ### Common fields * `company_name`: The name of the company that received funding. * `event_date`: The date when the fundraising occurred. * `extraction_date`: The date when the event was extracted from news sources. ### Fundraising-specific fields * `fundraising.amount`: The amount of funding raised by the company. * `fundraising.currency`: The currency of the funding amount. Possible values: USD, EUR, etc. * `fundraising.funding_type`: The type of funding round. Possible values: Seed, Series A, Series B, Series C, etc. * `fundraising.company_description`: The description of the funded company's business and activities. * `fundraising.company_legal_name`: The legal name of the company that received funding. * `fundraising.industry`: The industry sector in which the funded company operates. * `fundraising.founders`: The names of the company's founding team members. * `fundraising.investors`: The names of investors who participated in the funding round. * `fundraising.valuation`: The company's valuation at the time of funding. ## Searching for events Use the search endpoint to find fundraising events: ```bash POST /api/events_search ``` ### Basic request structure ```json { "event_type": "fundraising", "attach_articles_data": true, "additional_filters": { // search criteria using available fields } } ``` ### Using search fields Search by dates (absolute or relative): ```json { "event_type": "fundraising", "additional_filters": { "event_date": { "gte": "2024-01-01", "lte": "2024-02-01" } } } ``` Search by amount and funding type: ```json { "event_type": "fundraising", "additional_filters": { "fundraising.amount": { "gte": 1000000 }, "fundraising.funding_type": "Series A", "fundraising.currency": "USD" } } ``` Search by industry and investors: ```json { "event_type": "fundraising", "additional_filters": { "fundraising.industry": "AI", "fundraising.investors": "Specific Investor Name" } } ``` ## Understanding the response The API returns matched events in this structure: ```json { "message": "Success", "count": 97, "events": [ { "id": "event-id", "event_type": "fundraising", "global_event_type": "Finance", "associated_article_ids": ["article-id-1", "article-id-2"], "extraction_date": "2025-02-13 16:00:43", "event_date": "2025-02-13 00:00:00", "company_name": "Company Name", "fundraising": { "amount": 1000000, "currency": "USD", "funding_type": "Series A", "company_description": "Description of the company and its activities", "company_legal_name": "Company Legal Name Inc", "industry": "Technology", "founders": ["Founder Name 1", "Founder Name 2"], "investors": ["Investor Name 1", "Investor Name 2"], "valuation": "10000000", "summary": "Brief description of the funding round", "title": "Title of the funding announcement" } } ] } ``` ## Best practices 1. Use date filtering to track funding trends in specific time periods. 2. Combine amount and funding type filters to find specific investment stages. 3. Search by industry and funding type to monitor sector-specific investment patterns. 4. Use investor names to track specific investment firms' activities. ## See also * [Parameter formats](/v3/events/overview/parameter-formats) * [Working with articles](/v3/events/overview/working-with-articles) * [API reference](/v3/events/endpoints) # Search workforce reduction events Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/event-types/layoff Access and filter corporate layoff data using search parameters and field criteria. Events API provides access to structured information about workforce reductions extracted from news articles. This guide explains how to discover available search fields, construct search requests, and understand the returned data. ## Available search fields Get available search fields for layoff events using the discovery endpoint: ```bash GET /api/events_info/get_event_fields?event_type=layoff ``` The endpoint returns the following fields that can be used for filtering search results. All fields are optional for search requests. ### Common fields * `company_name`: The name of the company conducting the layoff. * `event_date`: The date when the layoff occurred. * `extraction_date`: The date when the event was extracted from news sources. ### Layoff-specific fields * `layoff.number_of_people_laid_off`: The exact number of employees affected by the layoff. * `layoff.percentage_of_people_laid_off`: The percentage of total workforce affected by the layoff. * `layoff.min_number_of_people_laid_off`: The minimum number of employees affected when a range is specified. * `layoff.max_number_of_people_laid_off`: The maximum number of employees affected when a range is specified. * `layoff.how_much_related`: The relevance rating of the layoff event. Possible values: "Completely Irrelevant", "Irrelevant", "Very Poor", "Poor", "Fair", "Good", "Very Good", "Excellent". Default: None. * `layoff.is_relevant_for_real_estate`: True if the layoff impacts real estate markets; false otherwise. * `layoff.layoff_reason`: The stated reason for the layoff. * `layoff.location`: The location details of the layoff. Contains fields for country, state, city, and county. * `layoff.summary`: A detailed description of the layoff event, including key information about the circumstances and impact. ## Searching for events Use the search endpoint to find layoff events: ```bash POST /api/events_search ``` ### Basic request structure ```json { "event_type": "layoff", "attach_articles_data": true, "additional_filters": { // search criteria using available fields } } ``` ### Using search fields Search by dates (absolute or relative): ```json { "event_type": "layoff", "additional_filters": { "event_date": { "gte": "2024-01-01", "lte": "2024-02-01" }, "extraction_date": { "gte": "now-7d", "lte": "now" } } } ``` Search by employee count: ```json { "event_type": "layoff", "additional_filters": { "layoff.number_of_people_laid_off": { "gte": 1000 }, "layoff.percentage_of_people_laid_off": { "gte": 10 } } } ``` Search by location: ```json { "event_type": "layoff", "additional_filters": { "layoff.location": { "state": "California" } } } ``` Search by relevance: ```json { "event_type": "layoff", "additional_filters": { "layoff.how_much_related": "Excellent", "layoff.is_relevant_for_real_estate": "true" } } ``` ## Understanding the response The API returns matched events in this structure: ```json { "message": "Success", "count": 25, "events": [ { // Base event fields "id": "event-id", "event_type": "layoff", "global_event_type": "Layoff", "company_name": "Company Name", "event_date": "2025-02-13 00:00:00", "extraction_date": "2025-02-13 19:01:18", "associated_article_ids": ["article-id-1", "article-id-2"], // Layoff-specific data "layoff": { "number_of_people_laid_off": 1000, "percentage_of_people_laid_off": 10, "min_number_of_people_laid_off": 1000, "max_number_of_people_laid_off": 1000, "summary": "Event description", "layoff_reason": "Stated reason", "is_relevant_for_real_estate": false, "how_much_related": "Excellent", "location": [ { "city": "City Name", "county": "County Name", "state": "State Name", "country": "Country Name", "raw_location": "Original location text" } ] }, // Article data (when requested) "articles": [ // See "Working with articles" guide ] } ] } ``` ## Working with articles See [Working with articles](working-with-articles) for details about article data structure and available fields. ## Best practices 1. Use `extraction_date` for monitoring recent layoffs and live tracking. 2. Use `event_date` for historical analysis and reporting. 3. Combine multiple filters to narrow results more precisely. 4. Consider both exact numbers and ranges when searching by employee count. ## See also * [Parameter formats](/v3/events/overview/parameter-formats) * [Working with articles](/v3/events/overview/working-with-articles) * [API reference](/v3/events/endpoints) # Explore international tariff events Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/event-types/tariffs Monitor and analyze global trade policy changes using tariff event data with customizable search parameters. Events API provides access to structured information about international tariffs and trade measures extracted from news articles. This guide explains how to discover available search fields, construct search requests, and understand the returned data. ## Available search fields Get available search fields for tariff events using the discovery endpoint: ```bash GET /api/events_info/get_event_fields?event_type=tariffs_v2 ``` The endpoint returns the following fields that can be used for filtering search results. All fields are optional for search requests. ### Common fields * `company_name`: The name of the company related to the tariff event, if applicable. * `event_date`: The date when the tariff event occurred. * `extraction_date`: The date when the event was extracted from news sources. ### Tariff-specific fields * `tariffs_v2.imposing_country_name`: The name of the country implementing the tariff. * `tariffs_v2.imposing_country_code`: The ISO 3166-1 alpha-2 code of the country implementing the tariff. * `tariffs_v2.targeted_country_names`: Names of countries targeted by the tariff. * `tariffs_v2.targeted_country_codes`: ISO 3166-1 alpha-2 codes for countries targeted by the tariff. * `tariffs_v2.measure_type`: Type of trade measure being implemented. Possible values: new tariff, tariff increase, tariff reduction, retaliatory tariff, import ban, quota, other trade restriction. * `tariffs_v2.main_tariff_rate`: The most significant tariff rate mentioned. * `tariffs_v2.tariff_rates`: List of tariff rate descriptions in the format "X% on Y". * `tariffs_v2.previous_tariff_rate`: The tariff rates before this change. * `tariffs_v2.affected_industries`: Industries affected by the tariff using GICS sectors. * `tariffs_v2.affected_products`: Specific products affected by the tariff or trade measure. * `tariffs_v2.hs_product_categories`: Harmonized System (HS) sections of products affected by the tariff. * `tariffs_v2.announcement_date`: Date when the tariff was announced. * `tariffs_v2.implementation_date`: Date when the tariff will be or was implemented. * `tariffs_v2.estimated_trade_value`: The estimated value of trade affected by the measure. * `tariffs_v2.policy_objective`: Stated policy objective for implementing the tariff. * `tariffs_v2.trigger_event`: Description of what triggered a retaliatory measure. * `tariffs_v2.relevance_score`: Rating of how directly the article addresses specific tariff announcements. * `tariffs_v2.summary`: A comprehensive summary of the tariff announcement or change. ## Searching for events Use the search endpoint to find tariff events: ```bash POST /api/events_search ``` ### Basic request structure ```json { "event_type": "tariffs_v2", "attach_articles_data": true, "additional_filters": { // search criteria using available fields } } ``` ### Using search fields Search by dates (absolute or relative): ```json { "event_type": "tariffs_v2", "additional_filters": { "extraction_date": { "gte": "now-7d", "lte": "now" }, "tariffs_v2.implementation_date": { "gte": "2025-01-01", "lte": "2025-04-01" } } } ``` Search by countries and tariff rates: ```json { "event_type": "tariffs_v2", "additional_filters": { "tariffs_v2.imposing_country_code": "US", "tariffs_v2.targeted_country_codes": ["CN"], "tariffs_v2.main_tariff_rate": { "gte": 20 } } } ``` Search by measure type and affected industries: ```json { "event_type": "tariffs_v2", "additional_filters": { "tariffs_v2.measure_type": "retaliatory tariff", "tariffs_v2.affected_industries": ["Materials"] } } ``` Search by products and relevance: ```json { "event_type": "tariffs_v2", "additional_filters": { "tariffs_v2.affected_products": ["steel"], "tariffs_v2.hs_product_categories": [ "XV: Base metals and articles of base metal" ], "tariffs_v2.relevance_score": "High" } } ``` ## Understanding the response The API returns matched events in this structure: ```json { "message": "Success", "count": 20, "events": [ { "id": "event-id", "event_type": "tariffs_v2", "global_event_type": "TradePolicy", "associated_article_ids": ["article-id-1", "article-id-2"], "extraction_date": "2025-03-12 12:04:06", "event_date": null, "company_name": null, "tariffs_v2": { "summary": "Comprehensive summary of the tariff announcement", "affected_products": ["aluminum", "steel"], "imposing_country_name": "United States", "affected_industries": ["Materials"], "main_tariff_rate": 25, "announcement_date": "2025/03/18", "tariff_rates": ["25% on steel", "25% on aluminum"], "targeted_country_codes": ["MX", "KR", "BR", "EU", "CN", "CA"], "hs_product_categories": ["XV: Base metals and articles of base metal"], "targeted_country_names": [ "European Union", "Canada", "China", "South Korea", "Brazil", "Mexico" ], "relevance_score": "High", "measure_type": "new tariff", "imposing_country_code": "US", "implementation_date": "2025/04/02" }, "articles": [ // Article data when requested ] } ] } ``` ## Best practices 1. Use `extraction_date` to monitor recently reported trade policy developments. 2. Use `tariffs_v2.announcement_date` and `tariffs_v2.implementation_date` for tracking policy timelines. 3. Filter by both `tariffs_v2.imposing_country_code` and `tariffs_v2.targeted_country_codes` to track bilateral trade tensions. 4. Combine `tariffs_v2.affected_industries` and `tariffs_v2.hs_product_categories` for sector-specific analysis. 5. Use `tariffs_v2.measure_type` to distinguish between initial tariffs and retaliatory measures. 6. Use `tariffs_v2.main_tariff_rate` to filter for significant trade barriers. 7. Filter by `tariffs_v2.relevance_score` to prioritize highly relevant information. ## Array field filtering When filtering for array fields, make sure to use array syntax even when searching for a single value: ```json { "event_type": "tariffs_v2", "additional_filters": { "tariffs_v2.targeted_country_codes": ["CA", "MX"], // Multiple values "tariffs_v2.affected_industries": ["Materials"], // Single value, still using array "tariffs_v2.hs_product_categories": [ "XV: Base metals and articles of base metal" ] } } ``` The following array fields require array syntax for filtering: * `tariffs_v2.targeted_country_codes` * `tariffs_v2.targeted_country_names` * `tariffs_v2.affected_industries` * `tariffs_v2.affected_products` * `tariffs_v2.hs_product_categories` * `tariffs_v2.tariff_rates` ## See also * [Parameter formats](/v3/events/overview/parameter-formats) * [Working with articles](/v3/events/overview/working-with-articles) * [API reference](/v3/events/endpoints) # Introduction to Event Intelligence Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/overview/introduction Transform news into actionable business intelligence with structured events. Events Intelligence provides an end-to-end pipeline for extracting structured event data from news articles. It processes news content through specialized filtering, validation, and extraction stages to produce standardized event records. The system supports public events for all users and custom events implemented for specific business needs. ## System overview Our Events Intelligence system consists of four main components: ```mermaid graph LR %% Main graph flows left to right direction LR subgraph Articles["Articles"] A[(News Store)] end subgraph ETL ["Processing"] direction TB B[News Filtering] C[LLM Validation] D[Event Data Extraction] E[Deduplication & Merging] F[Event Creation & Indexing] B --> C C --> D D --> E E --> F end subgraph Storage["Events"] G[("Events Store")] end subgraph Serving["Serving"] direction TB H[Events API] I[Batch Export] J[Event Streams] end %% Connect the main components A --> ETL ETL --> G G --> H & I & J %% Styling classDef source fill:#f5f5f5,stroke:#333,stroke-width:2px classDef storage fill:#f5f5f5,stroke:#333,stroke-width:2px classDef access fill:#bbf,stroke:#333,stroke-width:2px classDef pipeline fill:#f9f9f9,stroke:#666,stroke-width:1px class A source class G storage class H,I,J access class B,C,D,E,F pipeline ``` ### News store The foundation of our system is a comprehensive collection of structured news articles in JSON format dating back to 2019. Each article has undergone extensive cleaning, enrichment, and modeling to ensure high quality and searchability. This carefully curated repository represents our core product, providing rich contextual information and standardized content. ### Processing pipeline The pipeline transforms news data into structured events through several specialized stages: 1. **News Filtering**: Identifies potentially relevant articles using specialized queries tailored to each event type. 2. **LLM Validation**: Employs state-of-the-art language models (like GPT-4o) to validate that filtered articles represent tracked events. 3. **Event Data Extraction**: Employs specialized AI models to extract structured event information. These models are fine-tuned for specific event types using carefully curated datasets. 4. **Deduplication & Merging**: Maintains data consistency by identifying and combining related event mentions while preserving unique details. 5. **Event Creation & Indexing**: Standardizes processed events according to defined schemas and indexes them for efficient retrieval. ### Events store A centralized repository for all structured event data. By default, it contains events extracted from articles published in the last 30 days. While historical event extraction (2019-present) is available upon request, it incurs additional costs. Once an event type is implemented, all newly collected articles are automatically processed for relevant events. ### Serving layer Processed event data is available through multiple channels: * **Events API**: The primary access method, providing RESTful endpoints for querying and retrieving event data * **Batch Export**: Enables bulk extraction for large-scale analysis * **Event Streams**: Offers real-time access to newly processed events ## Events API The Events API is a RESTful API that provides access to structured event data extracted from news articles. It lets you retrieve and analyze specific business events, such as corporate activities, market changes, and business developments. ### Base URL You must send all API requests to the following base URL: ```bash https://events.newscatcherapi.xyz ``` ### Endpoints | Endpoint | Method | Description | Use Case | | ----------------------------------- | -------- | ----------------------------------------- | ---------------------------------------------- | | `/api/events_info/get_event_fields` | GET | Get event fields for specified event type | Discover searchable fields for each event type | | `/api/events_search` | POST | Search for events | Find events matching specific criteria | | `/api/health` | GET | Check API health status | Monitor API availability | | `/api/subscription` | GET/POST | Get subscription plan details | Check available events and usage limits | ### Request format Include your API key in the `x-api-token` header for each request. All requests must use HTTPS. ```python Python import requests import json API_KEY = "YOUR_API_KEY" URL = "https://events.newscatcherapi.xyz/api/events_search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "event_type": "layoff", "attach_articles_data": True, "additional_filters": { "layoff.number_of_people_laid_off": {"gte": 1000}, "event_date": {"gte": "now-30d", "lte": "now"}, }, } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch events: {e}") ``` ```bash cURL curl -X POST \ 'https://events.newscatcherapi.xyz/api/events_search' \ -H 'x-api-token: YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "event_type": "layoff", "attach_articles_data": true, "additional_filters": { "layoff.number_of_people_laid_off": { "gte": 1000 }, "event_date": { "gte": "now-30d", "lte": "now" } } }' ``` ### Response format All API responses are JSON objects containing events and metadata. For example, the key response fields for layoff event include: The status message of the search operation. Example: "Success" The total number of events returned. Example: 25 The list of matched events. Each event contains: The unique identifier of the event. The specific type of the event. The high-level category of the event. The identifiers of news articles associated with this event. The timestamp when the event was extracted from news sources. The timestamp when the event occurred. The name of the company involved in the event. Array of source articles. Only present when `attach_articles_data` is `true`. For layoff events, contains the following required fields: The exact number of employees affected. The percentage of total workforce affected. The minimum number of employees affected if a range was specified. The maximum number of employees affected if a range was specified. True if the layoff impacts real estate market; false otherwise. The detailed description of the layoff event. The relevance rating of the layoff event. The locations where the layoff occurred. The country where the event occurred. The city where the event occurred. The county where the event occurred. The state where the event occurred. The unparsed location string from the source. ## Available events The system supports two categories of events: ### General events Events available to all customers: * [Layoff](/v3/events/event-types/layoff) * [Data Breach](/v3/events/event-types/data-breach) * [Fundraising](/v3/events/event-types/fundraising) * [Tariffs](/v3/events/event-types/tariffs) ### Custom events Implemented for specific organizational requirements and accessible only through organization-specific API keys. Each custom implementation involves developing specialized extraction models and validation rules tailored to specific event requirements. ## What's next To start working with the Events API, refer to these technical resources: * [Quickstart guide](/v3/events/overview/quickstart) * [Parameter formats](/v3/events/overview/parameter-formats) * [API reference](/v3/events/endpoints) For technical support, contact us at [support@newscatcherapi.com](mailto:support@newscatcherapi.com). # Working with search parameters Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/overview/parameter-formats Learn about parameter types and formats for filtering event data. Events API supports several parameter types for filtering events. This guide explains available parameter formats and provides examples of their usage. For a complete list of available parameters for each event type, see the [API reference](/v3/events/endpoints). ## Parameter types In addition to the standard parameter types like string, array, object, integer, and number, the Events API uses custom schemas for numeric, date, and location parameters. ### Numeric parameters Use numeric parameters to filter by quantities or measurements. Format: ```json { "additional_filters": { "parameter_name": { "gte": 100, "lte": 1000 } } } ``` * `gte`: Greater than or equal to value * `lte`: Less than or equal to value Example - finding layoffs affecting 100-1000 employees: ```json { "event_type": "layoff", "additional_filters": { "layoff.number_of_people_laid_off": { "gte": 100, "lte": 1000 } } } ``` ### String parameters Use string parameters to filter by text values. Accepts either a single string or an array of strings. Single string format: ```json { "additional_filters": { "parameter_name": "value" } } ``` Multiple values format: ```json { "additional_filters": { "parameter_name": ["value1", "value2"] } } ``` Example - filtering fundraising events by funding type: ```json { "event_type": "fundraising", "additional_filters": { "fundraising.funding_type": "Series A" } } ``` ### Date parameters Use date parameters to filter by time ranges. Supports both absolute dates and relative formats. Format: ```json { "additional_filters": { "parameter_name": { "gte": "date_value", "lte": "date_value" } } } ``` Supported date formats: * Absolute dates: "YYYY-MM-DD" * Relative dates: "now", "now-30d" Example - events from last 30 days: ```json { "additional_filters": { "event_date": { "gte": "now-30d", "lte": "now" } } } ``` Example - events in a specific date range: ```json { "additional_filters": { "event_date": { "gte": "2024-01-01", "lte": "2024-02-01" } } } ``` ### Location parameters Use location parameters to filter by geographic information. Format: ```json { "additional_filters": { "parameter_name": { "country": "Country Name", "state": "State Name", "city": "City Name", "county": "County Name" } } } ``` All location fields are optional. Use only the fields needed for filtering. Example - filtering layoffs in a specific state: ```json { "event_type": "layoff", "additional_filters": { "layoff.location": { "state": "California" } } } ``` ## Combining parameters Combine multiple parameters to create more specific filters: ```json { "event_type": "fundraising", "additional_filters": { "event_date": { "gte": "now-30d", "lte": "now" }, "fundraising.amount": { "gte": 1000000 }, "fundraising.funding_type": "Series A", "fundraising.currency": "USD" } } ``` # Events API quickstart guide Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/overview/quickstart Get started with Events API. Transform news into actionable business intelligence with structured events! This guide helps you make your first API calls to retrieve structured event data from news articles. ## Before you start Before you begin, make sure you meet these prerequisites: * An API key (obtain one through our [pricing page](https://www.newscatcherapi.com/pricing)) * Basic understanding of REST APIs * Your preferred HTTP client (curl, Postman, etc.) * Basic knowledge of JSON data format ## Get started First, let's verify that your API key works and the service is available: ```bash cURL curl -X GET https://events.newscatcherapi.xyz/api/health \ -H "x-api-token: YOUR_API_KEY" ``` ```python Python import requests API_KEY = "YOUR_API_KEY" URL = "https://events.newscatcherapi.xyz/api/health" HEADERS = {"x-api-token": API_KEY} try: response = requests.get(URL, headers=HEADERS) response.raise_for_status() print(response.json()) except requests.exceptions.RequestException as e: print(f"Health check failed: {e}") ``` ```typescript TypeScript import axios, { AxiosResponse } from "axios"; const API_KEY: string = "YOUR_API_KEY"; const URL: string = "https://events.newscatcherapi.xyz/api/health"; axios .get(URL, { headers: { "x-api-token": API_KEY }, }) .then((response: AxiosResponse) => { console.log(response.data); }) .catch((error) => { console.error(`Health check failed: ${error.message}`); }); ``` You should receive: ```json { "message": "Healthy" } ``` Check what event types are available to you: ```bash cURL curl -X GET https://events.newscatcherapi.xyz/api/subscription \ -H "x-api-token: YOUR_API_KEY" ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY" URL = "https://events.newscatcherapi.xyz/api/subscription" HEADERS = {"x-api-token": API_KEY} try: response = requests.get(URL, headers=HEADERS) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch subscription details: {e}") ``` ```typescript TypeScript import axios, { AxiosResponse } from "axios"; const API_KEY: string = "YOUR_API_KEY"; const URL: string = "https://events.newscatcherapi.xyz/api/subscription"; axios .get(URL, { headers: { "x-api-token": API_KEY }, }) .then((response: AxiosResponse) => { console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Failed to fetch subscription details: ${error.message}`); }); ``` This shows your subscription status and available events: ```json { "active": true, "calls_per_seconds": 5, "plan_name": "events", "usage_assigned_calls": 10000, "usage_remaining_calls": 9959, "additional_info": { "available_events": ["layoff", "data_breach", "fundraising"] } } ``` Before searching, let's see what fields are available for a specific event type: ```bash cURL curl -X GET "https://events.newscatcherapi.xyz/api/events_info/get_event_fields?event_type=layoff" \ -H "x-api-token: YOUR_API_KEY" ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY" URL = "https://events.newscatcherapi.xyz/api/events_info/get_event_fields" HEADERS = {"x-api-token": API_KEY} PARAMS = {"event_type": "layoff"} try: response = requests.get(URL, headers=HEADERS, params=PARAMS) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch event fields: {e}") ``` ```typescript TypeScript import axios, { AxiosResponse } from "axios"; const API_KEY: string = "YOUR_API_KEY"; const URL: string = "https://events.newscatcherapi.xyz/api/events_info/get_event_fields"; const params = { event_type: "layoff", }; axios .get(URL, { headers: { "x-api-token": API_KEY }, params: params, }) .then((response: AxiosResponse) => { console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Failed to fetch event fields: ${error.message}`); }); ``` This returns the fields you can use in your searches: ```json { "message": "Success", "count": 12, "fields": { "company_name": { "type": "String", "usage_example": { "company_name": "String Example" } }, "event_date": { "type": "Date", "usage_example": { "event_date": { "lte": "now", "gte": "now-1d" } } } // ... other fields } } ``` Now let's perform your first event search: ```bash cURL curl -X POST https://events.newscatcherapi.xyz/api/events_search \ -H "x-api-token: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "event_type": "layoff", "attach_articles_data": true, "additional_filters": { "extraction_date": { "gte": "now-1d", "lte": "now" }, "layoff.number_of_people_laid_off": { "gte": 10000 } } }' ``` ```python Python import requests import json API_KEY = "YOUR_API_KEY" URL = "https://events.newscatcherapi.xyz/api/events_search" HEADERS = { "x-api-token": API_KEY, "Content-Type": "application/json" } payload = { "event_type": "layoff", "attach_articles_data": True, "additional_filters": { "extraction_date": { "gte": "now-1d", "lte": "now" }, "layoff.number_of_people_laid_off": { "gte": 10000 } } } try: response = requests.post(URL, headers=HEADERS, json=payload) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to search events: {e}") ``` ```typescript TypeScript import axios, { AxiosResponse } from "axios"; const API_KEY: string = "YOUR_API_KEY"; const URL: string = "https://events.newscatcherapi.xyz/api/events_search"; interface SearchPayload { event_type: string; attach_articles_data: boolean; additional_filters: { extraction_date: { gte: string; lte: string; }; "layoff.number_of_people_laid_off": { gte: number; }; }; } const payload: SearchPayload = { event_type: "layoff", attach_articles_data: true, additional_filters: { extraction_date: { gte: "now-1d", lte: "now", }, "layoff.number_of_people_laid_off": { gte: 10000, }, }, }; axios .post(URL, payload, { headers: { "x-api-token": API_KEY, "Content-Type": "application/json", }, }) .then((response: AxiosResponse) => { console.log(JSON.stringify(response.data, null, 2)); }) .catch((error) => { console.error(`Failed to search events: ${error.message}`); }); ``` This searches for recent large layoff events, with over 10,000 people affected, from the last 24 hours. The response will include structured event data and associated articles. ```json { "message": "Success", "count": 2, "events": [ { "id": "n1jk-pQBvyT_ytpRpzBn", "layoff": { "summary": "The Trump administration is planning large-scale layoffs in the federal workforce, with over 65,000 employees accepting a deferred resignation offer. The layoffs are part of the 'Department of Government Efficiency' Workforce Optimization Initiative, which aims to reduce the federal workforce by 5% to 10%.", "layoff_reason": "Workforce optimization initiative", "max_number_of_people_laid_off": 65000, "how_much_related": "Excellent", "is_relevant_for_real_estate": false, "location": [ { "country": "United States", "city": "Washington", "raw_location": "Washington, DC", "county": "District of Columbia", "state": "Washington" } ], "number_of_people_laid_off": 65000, "percentage_of_people_laid_off": 5, "min_number_of_people_laid_off": 65000 }, "event_type": "layoff", "global_event_type": "Layoff", "associated_article_ids": ["94496b567dbd387718ab8065dea16c1d"], "extraction_date": "2025-02-12 16:01:16", "event_date": "2021-10-15 00:00:00", "company_name": "Department of Government Efficiency", "articles": [ { "link": "https://www.cnn.com/2025/02/12/politics/federal-employees-layoffs-trump/index.html", "id": "94496b567dbd387718ab8065dea16c1d", "title": "DOGE's power expands as federal agencies start planning large-scale layoffs" } ] }, { "id": "lFjQ9pQBvyT_ytpR2yrN", "layoff": { "summary": "Meta conducted layoffs on October 10, 2022, affecting 11,000 employees, which is 11% of the company's workforce. The layoffs were a surprise to many employees, including those with strong performance records.", "layoff_reason": "To eliminate underperformers", "max_number_of_people_laid_off": 11000, "how_much_related": "Excellent", "is_relevant_for_real_estate": false, "location": [ { "country": "United States", "city": "Menlo Park", "raw_location": "Menlo Park, San Mateo, California, United States", "county": "San Mateo", "state": "California" } ], "number_of_people_laid_off": 11000, "percentage_of_people_laid_off": 11, "min_number_of_people_laid_off": 11000 }, "event_type": "layoff", "global_event_type": "Layoff", "associated_article_ids": ["79f95268b0415c681ecb961f3e371742"], "extraction_date": "2025-02-11 21:01:10", "event_date": "2022-10-10 00:00:00", "company_name": "Meta", "articles": [ { "link": "https://digg.com/insider/link/meta-job-cuts-surprise-some-employees-strong-performers", "id": "79f95268b0415c681ecb961f3e371742", "title": "Meta's Job Cuts Surprised Some Employees Who Said They Weren't Low Performers" } ] } ] } ``` ## What's next Now that you've made your first API calls, you can: * Learn about [Event types](/v3/events/event-types) * Explore [API reference](/v3/events/endpoints/event-fields-get) * Set up event monitoring for your specific use case Need help? Contact our support team at [support@newscatcherapi.com](mailto:support@newscatcherapi.com) # Access source articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/events/overview/working-with-articles Request and filter source articles with available fields and options. Events API provides access to source articles from which events are extracted. These source articles help verify extracted event data and provide additional context. The API returns articles in an array and filters them based on the event search criteria. ## Requesting article data To include article data in your request, set `attach_articles_data: true`: ```json { "event_type": "layoff", "attach_articles_data": true, "additional_filters": { // filters } } ``` ## Default article fields By default, when `attach_articles_data` is set to `true`, each article object includes four basic fields: * `id`: Unique identifier for the article * `title`: Article title * `link`: URL link to the article * `media`: Media associated with the article ## Available additional fields The API supports requesting additional article fields through the `additional_article_fields` parameter. Available fields include: * Author information: * `author`: Primary author of the article * `authors`: List of authors of the article * `journalists`: List of journalists associated with the article * Publication details: * `published_date`: Date the article was published * `published_date_precision`: Precision of the published date * `name_source`: Name of the source where the article was published * `language`: Language in which the article is written * `description`: Brief description of the article * `content`: Full content of the article * Domain information: * `domain_url`: Domain URL of the article * `full_domain_url`: Full domain URL of the article * Article attributes: * `is_headline`: Indicates if the article is a headline * `paid_content`: Indicates if the article is paid content * `rights`: Rights information for the article * `rank`: Rank of the article's source * `is_opinion`: Indicates if the article is an opinion piece * `word_count`: Word count of the article * `twitter_account`: Twitter account associated with the article * Link analysis: * `all_links`: List of all URLs mentioned in the article * `all_domain_links`: List of all domain URLs mentioned in the article * `extraction_data.parent_url`: Parent URL information from extraction metadata * Natural language processing: * `nlp.theme`: Article themes or categories * `nlp.summary`: AI-generated article summary * `nlp.sentiment`: Sentiment analysis scores * `nlp.ner_PER`: Named entities (persons) * `nlp.ner_ORG`: Named entities (organizations) * `nlp.ner_MISC`: Named entities (miscellaneous) * `nlp.ner_LOC`: Named entities (locations) ## Requesting additional fields Use `additional_article_fields` to request specific fields beyond the default ones: ```json { "event_type": "layoff", "attach_articles_data": true, "additional_article_fields": [ "description", "content", "published_date", "nlp.summary", "nlp.sentiment" ] } ``` The response will include both the default fields and any additional requested fields. # Retrieve latest headlines Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/endpoints/latest-headlines/retrieve-latest-headlines-post local-news-api post /api/latest_headlines Retrieves the most recent news headlines for the specific locations and times. You can filter results by language, source, theme, and more. # Search articles by identifiers Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/endpoints/search-by/search-articles-by-identifiers-post local-news-api post /api/search_by Search for local news using article links, IDs, or RSS GUIDs. # Search articles Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/endpoints/search/search-articles-post local-news-api post /api/search Searches for local news based on specified criteria such as keyword, language, country, source, and more. # Retrieve sources Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/endpoints/sources/retrieve-sources-post local-news-api post /api/sources Retrieves the list of local news sources available in the database. Filterable by language, country, and theme. # Introduction to Local News API Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/overview/introduction Local News API features and capabilities ## What is Local News API? Local News API provides precise access to location-specific news content worldwide. Building upon News API v3's robust foundation, it adds specialized features for geographic news discovery and analysis. Whether developing local news applications, analyzing regional trends, or tracking city-specific coverage, Local News API helps you efficiently work with geographically focused news data. ## Key features * AI-powered location recognition and validation * Advanced querying with boolean operators and proximity search * Multiple town association methods for precise location matching * NLP-enriched content for deeper analysis * Articles clustering * Multi-language support * High volume data retrieval (up to 1000 articles per request) ## Base URL You must send all the API requests to the following base URL: ```bash https://local-news.newscatcherapi.com ``` ## Endpoints | Endpoint | Method | Description | Use Case | | ----------------------- | ------ | ---------------------------------------- | ------------------------------------------------------------ | | `/api/search` | POST | Full-text search with location filtering | Find articles matching specific criteria for given locations | | `/api/latest_headlines` | POST | Recent articles by location | Monitor latest news for specific towns or regions | | `/api/search_by` | POST | Direct article lookup | Retrieve specific articles using URLs, IDs, or RSS GUIDs | | `/api/sources` | POST | Available news sources | Discover local news providers by region | ## Request Format Include your API key in the `x-api-token` header for each request. All requests must use HTTPS. ```python Python import requests import json API_KEY = "YOUR_API_KEY_HERE" URL = "https://local-news.newscatcherapi.com/api/search" HEADERS = { "x-api-token": API_KEY, "Content-Type": "application/json" } PAYLOAD = { "q": "venture capital", "associated_towns": [{ "name": "San Francisco, California" }], "lang": "en", "from_": "7 days ago" } try: response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` ```bash cURL curl -X POST \ 'https://local-news.newscatcherapi.com/api/search' \ -H 'x-api-token: YOUR_API_KEY_HERE' \ -H 'Content-Type: application/json' \ -d '{ "q": "venture capital", "associated_towns": [{ "name": "San Francisco, California" }], "lang": "en", "from_": "7 days ago" }' ``` ## Response format All API responses are JSON objects containing articles and metadata. The key response fields include: The status of the request. The total number of articles matching the query. The current page number. The total number of available pages. The number of articles per page. An array of article objects, each containing location-specific metadata and content. The unique identifier for the article. A list of towns associated with the article, including association methods. Towns identified through AI analysis of the article content. The title of the article. The full content of the article. NLP analysis data including sentiment, themes, and named entities. The parameters used in your request for verification and debugging. For detailed descriptions of all available response fields, refer to the specific endpoint documentation in the [Endpoints](/v3/local-news/endpoints/) section. ### Example request Here's a sample search for venture capital news in San Francisco: ```json Request { "q": "venture capital", "associated_towns": [ { "name": "San Francisco, California" } ], "lang": "en", "from_": "7 days ago" } ``` ### Example response ```json Response { "status": "ok", "total_hits": 5, "page": 1, "total_pages": 1, "page_size": 100, "articles": [ { "id": "dabf25a5cbee2705f1ab866eeafdc835", "associated_town": [ { "ai_validated": true, "name": "San Francisco, California", "description": ["HYPERLOCAL_SOURCES_INCLUDE_QUERY"] } ], "ai_associated_town": null, "score": 17.720953, "title": "Tech company to move headquarters from San Francisco to Texas", "author": "Aayush Gupta", "link": "https://www.bizjournals.com/sanfrancisco/news/2024/10/24/simplilearn-kumar-plano-san-francisco-headquarters.html", "description": "The company offers a variety of online training programs for those interested in working in the tech industry, including programming, cybersecurity, digital marketing and artificial intelligence.", "media": "https://media.bizj.us/view/img/11514132/042712369*1200xx5184-2916-0-270.jpg", "content": "A global digital education company is relocating its U.S headquarters from downtown San Francisco to North Texas...", "authors": ["Aayush Gupta"], "published_date_precision": "full", "published_date": "2024-10-24 18:40:00", "updated_date": null, "updated_date_precision": null, "is_opinion": false, "twitter_account": null, "domain_url": "bizjournals.com", "parent_url": "https://www.bizjournals.com/sanfrancisco", "word_count": 391, "rank": 503, "country": "US", "rights": "bizjournals.com", "language": "en", "nlp": { "theme": ["Business", "Tech"], "summary": "Simplilearn Solutions Pvt. Ltd. is relocating its U.S headquarters from downtown San Francisco to Plano, Texas. The new HQ will be located at 5851 Legacy Circle in Legacy Town Center.", "sentiment": { "title": 0.0, "content": 0.9911 }, "ner_PER": [ { "entity_name": "Krishna Kumar", "count": 1 } ], "ner_ORG": [ { "entity_name": "Simplilearn", "count": 4 } ], "ner_MISC": [ { "entity_name": "NYSE", "count": 1 } ], "ner_LOC": [ { "entity_name": "San Francisco", "count": 2 }, { "entity_name": "Texas", "count": 1 } ] }, "paid_content": false } // ... other articles ], "user_input": { "q": "venture capital", "associated_towns": [ { "name": "San Francisco, California" } ], "lang": "en", "from_": "7 days ago" } } ``` ## Getting started To start using Local News API: 1. [Contact our sales team](https://www.newscatcherapi.com/pricing) to discuss your needs and obtain an API key. 2. Once you have your API key, check out our [Quickstart guide](/v3/local-news/overview/quickstart) to make your first API call. 3. Explore [Documentation](/v3/local-news/overview/introduction) to unlock the full potential of the API. For detailed information about request parameters and response formats, refer to the specific endpoint documentation in this API reference. The Local News API is your gateway to precise, location-specific news content. Whether you're building local news applications, analyzing regional trends, or tracking city-specific coverage, we're here to help you navigate the world of local news data. # Local News API subscription plans Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/overview/local-news-api-subscription-plans Select a Local News API plan that matches your needs - from standard NLP analysis to advanced AI-powered location features. Each plan includes multi-language support, article clustering, NLP analysis, town association methods, and comprehensive metadata. Plans differ in their specialized features and processing capabilities. ## Available plans **Plan ID**: `v3_local_news_nlp` Standard plan provides comprehensive NLP capabilities for content analysis and monitoring: Access complete article metadata and clustering Categorize content across 15+ themes (Business, Tech, Politics, and more) Detect named entities (PER, ORG, LOC, MISC) Analyze sentiment in titles and content on a `-1.0` to `1.0` scale Get AI-generated article summaries Use 5 town association methods for precise location matching **Plan ID**: `v3_local_news_nlp_embeddings` Embeddings plan adds vector embeddings for advanced content analysis: Access all Standard NLP features Get 1024-dimensional vector embeddings through `nlp.new_embeddings` Use the multilingual-e5-large model for embeddings generation Enable semantic search in your applications Build similarity-based news aggregations **Plan ID**: `v3_local_news_ai_extraction_nlp` AI Extraction plan adds AI-powered location features: Access all Standard NLP features Extract towns from article content using AI Validate locations automatically Get town information through `ai_associated_town` in responses Filter content using the `search_in_ai_associated_town` parameter Access AI town data starting from September 24, 2024 ## Technical specifications | Feature | Standard | Embeddings | AI Extraction | | ---------------------- | -------- | ---------- | ------------- | | **Core Features** | | | | | Article Metadata | ✓ | ✓ | ✓ | | Multi-language Support | ✓ | ✓ | ✓ | | Clustering | ✓ | ✓ | ✓ | | **NLP Features** | | | | | Theme Analysis | ✓ | ✓ | ✓ | | Entity Recognition | ✓ | ✓ | ✓ | | Sentiment Analysis | ✓ | ✓ | ✓ | | Vector Embeddings | - | ✓ | - | | **Location Features** | | | | | Location Filtering | ✓ | ✓ | ✓ | | AI Location Extraction | - | - | ✓ | | Location Validation | - | - | ✓ | ## Support If you need help selecting a plan, contact our team: * For pricing and sales: [sales@newscatcherapi.com](mailto:sales@newscatcherapi.com) * For technical questions: [support@newscatcherapi.com](mailto:support@newscatcherapi.com) Visit the [pricing page](https://www.newscatcherapi.com/pricing) to start your subscription. # Local News API quickstart guide Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/overview/quickstart This guide will help you make your first API call to Local News API and start retrieving location-specific news in just a few minutes. ## Before you start Before you begin, make sure you have: * An API key for Local News API (obtained from your account manager) * Python 3.6+ installed on your system * The `requests` library for Python ## Steps First, make sure you have Python and the `requests` library installed. You can install `requests` using pip: ```bash pip install requests ``` Create a new file named `local_news_quickstart.py` and add the following code: ```python import requests import json # Configuration API_KEY = "YOUR_API_KEY_HERE" # Replace with your actual API key URL = "https://local-news.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "*", "associated_towns": [{"name": "San Francisco, California"}], "theme": "Tech", "lang": "en", "from_": "7 days ago", } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Check if the request was successful # Print the raw JSON response print(json.dumps(response.json(), indent=2)) except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` Remember to replace `YOUR_API_KEY_HERE` with your actual API key. Run the script from your terminal: ```bash python local_news_quickstart.py ``` You can see a JSON response similar to this (shortened for readability): ```json { "status": "ok", "total_hits": 51, "page": 1, "total_pages": 1, "page_size": 100, "articles": [ { "id": "d156bb26af2b39ed33bd96b8428b4b21", "associated_town": [ { "ai_validated": true, "name": "San Francisco, California", "description": ["HYPERLOCAL_SOURCES_INCLUDE_QUERY"] } ], "title": "Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said", "link": "https://www.sfchronicle.com/business/article/researchers-say-an-ai-powered-transcription-tool-19864411.php", "published_date": "2024-10-26 04:15:41", "domain_url": "sfchronicle.com", "nlp": { "theme": ["Tech", "Science"], "summary": "Whisper, an AI-powered transcription tool, is prone to making up chunks of text or entire sentences. It's being used in industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.", "sentiment": { "title": 0.8877, "content": -0.9958 }, // other NLP fields } } // ... additional articles ], "user_input": { "q": "*", "associated_towns": [ { "name": "San Francisco, California" } ], "theme": "Tech", "lang": "en", "from_": "7 days ago" } } ``` This response shows location-specific articles with rich metadata, including town associations, NLP analysis, and article details. To learn more about town association, see the [Town association methods](/v3/local-news/overview/town-association-methods). Let's modify our script to monitor significant tech industry movements and investments across major Bay Area tech hubs: ```python import requests # Configuration API_KEY = "YOUR_API_KEY_HERE" # Replace with your actual API key URL = "https://local-news.newscatcherapi.com/api/search" HEADERS = {"x-api-token": API_KEY, "Content-Type": "application/json"} PAYLOAD = { "q": "expansion OR investment OR startup OR headquarters", "associated_towns": [ {"name": "San Francisco, California"}, {"name": "San Jose, California"}, {"name": "Palo Alto, California"}, ], "theme": "Tech", "lang": "en", "from_": "7 days ago", "clustering": True, } # Association method descriptions ASSOCIATION_METHODS = { "HYPERLOCAL_SOURCES_EXCLUDE_QUERY": "Hyper-local source", "HYPERLOCAL_SOURCES_INCLUDE_QUERY": "Exact city match", "LOCAL_SOURCES_EXCLUDE_QUERY": "State-level source", "CITY_STATE_COUNTY_QUERY": "Matches 'City, State' format", "NEAR_CITY_STATE_QUERY": "Proximity match", } try: # Fetch articles using POST method response = requests.post(URL, headers=HEADERS, json=PAYLOAD) response.raise_for_status() # Parse and display the articles data = response.json() clusters = data.get("clusters", {}) print(f"Found {data['clusters_count']} major tech stories") print(f"Total articles: {data['total_hits']}") print("---") for cluster_id, cluster in clusters.items(): # Get the main article from each cluster article = cluster["articles"][0] # Extract key information title = article["title"] # Process all associated locations locations = [] for town in article["associated_town"]: # Get human-readable association methods methods = [ ASSOCIATION_METHODS.get(method, method) for method in town["description"] ] # Create location string with validation and methods validation = "AI validated" if town.get("ai_validated") else "Not validated" loc_str = f"{town['name']} ({validation}, {', '.join(methods)})" locations.append(loc_str) summary = article["nlp"]["summary"] cluster_size = cluster["cluster_size"] sentiment = ( "Positive" if article["nlp"]["sentiment"]["content"] > 0 else "Negative" ) # Print story details print(f"TRENDING STORY ({cluster_size} related articles):") print(f"Title: {title}") print("Locations:") for loc in locations: print(f" - {loc}") print(f"Sentiment: {sentiment}") print(f"Summary: {summary}") print("---") except requests.exceptions.RequestException as e: print(f"Failed to fetch articles: {e}") ``` Sample output: ```bash Found 20 major tech stories Total articles: 28 --- TRENDING STORY (3 related articles): Title: Persona Positioned Highest for Ability to Execute in the Inaugural Gartner Magic Quadrant™ for Identity Verification Locations: - San Francisco, California (AI validated, Exact city match) Sentiment: Positive Summary: Persona ranked first across all Use Cases: Know Your Customer (KYC), Fraud Detection, Account Recovery, and Sensitive or Regulated scenarios in the Gartner Critical Capabilities report. Persona's advanced identity verification platform stands as a critical defense that provides granular fraud and risk controls to help organizations stay ahead of digital fraud and safeguard both their business and customers. --- ``` The script shows how to: * Track major business moves across key tech hubs. * Monitor investment activity and expansions. * Identify trending stories using clustering. * Analyze market sentiment. * Get detailed summaries of major developments. Each clustered story provides a comprehensive view of significant tech industry movements, helping you stay informed about important developments in the Bay Area tech scene. ## What's next Now that you've made your first calls to the Local News API, here are some next steps: 1. Learn about [advanced querying](/v3/documentation/guides-and-concepts/advanced-querying) to refine your searches. 2. Explore [town association methods](/v3/local-news/overview/town-association-methods) for better location matching. 3. Read about [NLP features](/v3/documentation/guides-and-concepts/nlp-features) to extract insights from articles. If you have any questions or need assistance, please get in touch with our [support team](mailto:support@newscatcherapi.com). # Town association methods Source: https://newscatcherinc-docs.mintlify.dev/docs/v3/local-news/overview/town-association-methods Learn how Local News API connects articles with specific locations using five distinct association methods. ## Overview Town association methods help you accurately connect news articles with specific locations in your searches. These methods use different matching strategies to identify location references in articles, from analyzing news sources to processing natural language patterns. ## Understanding town association News articles can mention locations in several ways: * City names ("San Francisco") * Regional terms ("Bay Area") * Local landmarks ("Golden Gate Bridge") * Contextual references ("the city") The Local News API uses five association methods to capture these different reference patterns. ## Available methods ### HYPERLOCAL\_SOURCES\_EXCLUDE\_QUERY Identifies articles from dedicated local news sources that don't mention other locations. **Query pattern**: ``` NOT ((Alabama OR Alaska OR Texas...) AND (", AL" OR ", TX")) ``` Use this method when working with: * City-specific newspapers * University news portals * Local government sites * Regional sections of larger publications For example: ```json { "name": "San Francisco, California", "description": ["HYPERLOCAL_SOURCES_EXCLUDE_QUERY"] } ``` ### HYPERLOCAL\_SOURCES\_INCLUDE\_QUERY Finds exact matches of city names in article titles or content. **Query pattern**: ``` "San Francisco" ``` Use this method for: * Finding explicit city mentions * Analyzing headlines * Matching specific location names ### LOCAL\_SOURCES\_EXCLUDE\_QUERY Searches state-level sources while excluding mentions of other states. **Query pattern**: ``` city_name NOT (Alabama OR Alaska OR Texas...) AND (",AL" OR ",TX")) ``` Use this method with: * Regional news outlets * State-wide publications * Multi-city coverage sources ### CITY\_STATE\_COUNTY\_QUERY Searches for standard location format patterns used in journalism. **Query pattern**: ``` "city, state_code" OR "city, state" OR "city, county" ``` This method matches formats like: * "San Francisco, CA" * "San Francisco, California" * "San Francisco, San Francisco County" ### NEAR\_CITY\_STATE\_QUERY Finds articles where city and state names appear close to each other. **Query pattern**: ``` NEAR("San Francisco", "California", 15) ``` This method matches phrases like: * "New development in San Francisco draws attention across California" * "California's tech hub San Francisco sees startup growth" ## Implementation guide Select association methods based on your sources: ```python PAYLOAD = { "associated_towns": [ { "name": "San Francisco, California", "description": ["HYPERLOCAL_SOURCES_INCLUDE_QUERY"], } ] } ``` Use multiple methods for better coverage: ```python PAYLOAD = { "associated_towns": [ { "name": "San Francisco, California", "description": [ "HYPERLOCAL_SOURCES_EXCLUDE_QUERY", "CITY_STATE_COUNTY_QUERY", ], } ] } ``` Enable AI-powered town extraction: ```python PAYLOAD = { "associated_towns": [ { "name": "San Francisco, California", "description": ["HYPERLOCAL_SOURCES_EXCLUDE_QUERY"], } ], "search_in_ai_associated_town": True, } ``` This feature requires the `v3_local_news_ai_extraction_nlp` plan. ## Best practices * Use HYPERLOCAL methods for known local sources. * Apply NEAR queries for general news sources. * Combine methods for better coverage. * Start with specific methods. * Add broader methods if needed. * Use AI features when available. * Consider state context for city names. * Account for alternative location names. * Watch for ambiguous city names. ## See also * [Quickstart guide](/v3/local-news/overview/quickstart) * [Local News API subscription plans](/v3/local-news/overview/local-news-api-subscription-plans) * [Endpoints](/v3/local-news/endpoints)