Group similar articles together to reduce noise and gain insights
clustering_enabled
(boolean): Set to true
to enable clustering.clustering_threshold
(float): Determines how similar articles need to be to
end up in the same cluster. Values range from 0 to 1, with higher values
resulting in clusters with more similar articles. The default value is 0.6.clustering_variable
(string): Chooses which part of the article to use for
clustering. Options are content
(default), title
, or summary
.page_size
parameter.
Clustering operates on one page of results at a time, affecting how articles are
grouped. To ensure the most effective clustering:
page_size
to a value greater than your expected total_hits
.page_size
to
at least 150. This prevents related articles from being split across different
pages and, thus, different clusters.
clusters_count
: The total number of clusters foundclusters
: An array of cluster objects, each containing:
cluster_id
: A unique identifier for the clustercluster_size
: The number of articles in the clusterarticles
: An array of the articles in the clusterFeature | Clustering | Deduplication |
---|---|---|
Purpose | Groups similar articles | Removes nearly identical articles |
Content | Retains all articles | Removes duplicates |
Similarity Threshold | Generally lower, allowing broader groups | Higher, identifying near-exact matches |
Output | Groups of related articles | Set of unique articles |
Use Case | Analyzing related content, tracking trends | Eliminating redundancy, ensuring uniqueness |