relevanceai.api.endpoints.centroids

Module Contents

Classes

CentroidsClient

Base class for all relevanceai client utilities

class relevanceai.api.endpoints.centroids.CentroidsClient(project, api_key)

Bases: relevanceai.base._Base

Base class for all relevanceai client utilities

docs_closest_to_center
docs_furthest_from_center
list(self, dataset_id: str, vector_fields: List, alias: str = 'default', page_size: int = 5, cursor: str = None, include_vector: bool = False, base_url='https://gateway-api-aueast.relevance.ai/latest')

Retrieve the cluster centroid

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_fields (list) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

  • page_size (int) – Size of each page of results

  • cursor (string) – Cursor to paginate the document retrieval

  • include_vector (bool) – Include vectors in the search results

get(self, dataset_id: str, cluster_ids: List, vector_fields: List, alias: str = 'default', page_size: int = 5, cursor: str = None)

Retrieve the cluster centroids by IDs

Parameters
  • dataset_id (string) – Unique name of dataset

  • cluster_ids (list) – List of cluster IDs

  • vector_field (string) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

  • page_size (int) – Size of each page of results

  • cursor (string) – Cursor to paginate the document retrieval

insert(self, dataset_id: str, cluster_centers: List, vector_fields: List, alias: str = 'default')

Insert your own cluster centroids for it to be used in approximate search settings and cluster aggregations. :param dataset_id: Unique name of dataset :type dataset_id: string :param cluster_centers: Cluster centers with the key being the index number :type cluster_centers: list :param vector_field: The vector field where a clustering task was run. :type vector_field: string :param alias: Alias is used to name a cluster :type alias: string

documents(self, dataset_id: str, cluster_ids: List, vector_fields: List, alias: str = 'default', page_size: int = 5, cursor: str = None, page: int = 1, include_vector: bool = False, similarity_metric: str = 'cosine')

Retrieve the cluster centroids by IDs

Parameters
  • dataset_id (string) – Unique name of dataset

  • cluster_ids (list) – List of cluster IDs

  • vector_fields (list) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

  • page_size (int) – Size of each page of results

  • cursor (string) – Cursor to paginate the document retrieval

  • page (int) – Page of the results

  • include_vector (bool) – Include vectors in the search results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

metadata(self, dataset_id: str, vector_fields: List, alias: str = 'default', metadata: Optional[Dict[str, Any]] = None)

If metadata is none, retrieves metadata about a dataset. notably description, data source, etc Otherwise, you can store the metadata about your cluster here.

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_field (string) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

  • metadata (Optional[dict]) – If None, it will retrieve the metadata, otherwise it will overwrite the metadata of the cluster

list_closest_to_center(self, dataset_id: str, vector_fields: List, cluster_ids: List = [], alias: str = 'default', centroid_vector_fields: List = ['centroid_vector_'], select_fields: List = [], approx: int = 0, sum_fields: bool = True, page_size: int = 1, page: int = 1, similarity_metric: str = 'cosine', filters: List = [], facets: List = [], min_score: int = 0, include_vector: bool = False, include_count: bool = True, include_facets: bool = False)

List of documents closest from the centre.

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_field (string) – The vector field where a clustering task was run.

  • cluster_ids (lsit) – Any of the cluster ids

  • alias (string) – Alias is used to name a cluster

  • centroid_vector_fields (list) – Vector fields stored

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields

  • approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters (list) – Query for filtering the search results

  • facets (list) – Fields to include in the facets, if [] then all

  • min_score (int) – Minimum score for similarity metric

  • include_vectors (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

  • include_facets (bool) – Include facets in the search results

list_furthest_from_center(self, dataset_id: str, vector_fields: str, cluster_ids: List = [], alias: str = 'default', select_fields: List = [], approx: int = 0, sum_fields: bool = True, page_size: int = 1, page: int = 1, similarity_metric: str = 'cosine', filters: List = [], facets: List = [], min_score: int = 0, include_vector: bool = False, include_count: bool = True, include_facets: bool = False)

List of documents furthest from the centre.

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_fields (list) – The vector field where a clustering task was run.

  • cluster_ids (list) – Any of the cluster ids

  • alias (string) – Alias is used to name a cluster

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields

  • approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters (list) – Query for filtering the search results

  • facets (list) – Fields to include in the facets, if [] then all

  • min_score (int) – Minimum score for similarity metric

  • include_vectors (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

  • include_facets (bool) – Include facets in the search results

delete(self, dataset_id: str, vector_fields: List, alias: str = 'default')

Delete centroids by dataset ID, vector field and alias

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_field (string) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

update(self, dataset_id: str, vector_fields: List, id: str, update: dict = {}, alias: str = 'default')

Delete centroids by dataset ID, vector field and alias

Parameters
  • dataset_id (string) – Unique name of dataset

  • vector_field (List) – The vector field where a clustering task was run.

  • alias (string) – Alias is used to name a cluster

  • id (string) – The centroid ID

  • update (dict) – The update to be applied to the document