relevanceai.api.endpoints.recommend

Recommmend services.

Module Contents

Classes

RecommendClient

Base class for all relevanceai client utilities

class relevanceai.api.endpoints.recommend.RecommendClient(project, api_key)

Bases: relevanceai.base._Base

Base class for all relevanceai client utilities

vector(self, dataset_id: str, positive_document_ids: dict = {}, negative_document_ids: dict = {}, vector_fields=[], approximation_depth: int = 0, vector_operation: str = 'sum', sum_fields: bool = True, page_size: int = 20, page: int = 1, similarity_metric: str = 'cosine', facets: list = [], filters: list = [], min_score: float = 0, select_fields: list = [], include_vector: bool = False, include_count: bool = True, asc: bool = False, keep_search_history: bool = False, hundred_scale: bool = False)

Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.

Here are a couple of different scenarios and what the queries would look like for those:

Recommendations Personalized by single liked product:

>>> positive_document_ids=['A']

-> Document ID A Vector = Search Query

Recommendations Personalized by multiple liked product:

>>> positive_document_ids=['A', 'B']

-> Document ID A Vector + Document ID B Vector = Search Query

Recommendations Personalized by multiple liked product and disliked products:

>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']

-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query

Recommendations Personalized by multiple liked product and disliked products with weights:

>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}

-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query

You can change the operator between vectors with vector_operation:

e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’

-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query

Parameters
  • dataset_id (string) – Unique name of dataset

  • positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.

  • negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.

  • vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})

  • approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.

  • vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • facets (list) – Fields to include in the facets, if [] then all

  • filters (list) – Query for filtering the search results

  • min_score (int) – Minimum score for similarity metric

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields.

  • include_vector (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

  • asc (bool) – Whether to sort results by ascending or descending order

  • keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.

  • hundred_scale (bool) – Whether to scale up the metric by 100

diversity(self, dataset_id: str, cluster_vector_field: str, n_clusters: int, positive_document_ids: dict = {}, negative_document_ids: dict = {}, vector_fields=[], approximation_depth: int = 0, vector_operation: str = 'sum', sum_fields: bool = True, page_size: int = 20, page: int = 1, similarity_metric: str = 'cosine', facets: list = [], filters: list = [], min_score: float = 0, select_fields: list = [], include_vector: bool = False, include_count: bool = True, asc: bool = False, keep_search_history: bool = False, hundred_scale: bool = False, search_history_id: str = None, n_init: int = 5, n_iter: int = 10, return_as_clusters: bool = False)

Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.

Diversity recommendation increases the variety within the recommendations via clustering. Search results are clustered and the top k items in each cluster are selected. The main clustering parameters are cluster_vector_field and n_clusters, the vector field on which to perform clustering and number of clusters respectively.

Here are a couple of different scenarios and what the queries would look like for those:

Recommendations Personalized by single liked product:

>>> positive_document_ids=['A']

-> Document ID A Vector = Search Query

Recommendations Personalized by multiple liked product:

>>> positive_document_ids=['A', 'B']

-> Document ID A Vector + Document ID B Vector = Search Query

Recommendations Personalized by multiple liked product and disliked products:

>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']

-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query

Recommendations Personalized by multiple liked product and disliked products with weights:

>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}

-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query

You can change the operator between vectors with vector_operation:

e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’

-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query

Parameters
  • dataset_id (string) – Unique name of dataset

  • cluster_vector_field (str) – The field to cluster on.

  • n_clusters (int) – Number of clusters to be specified.

  • positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.

  • negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.

  • vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})

  • approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.

  • vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • facets (list) – Fields to include in the facets, if [] then all

  • filters (list) – Query for filtering the search results

  • min_score (int) – Minimum score for similarity metric

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields.

  • include_vector (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

  • asc (bool) – Whether to sort results by ascending or descending order

  • keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.

  • hundred_scale (bool) – Whether to scale up the metric by 100

  • search_history_id (str) – Search history ID, only used for storing search histories.

  • n_init (int) – Number of runs to run with different centroid seeds

  • n_iter (int) – Number of iterations in each run

  • return_as_clusters (bool) – If True, return as clusters as opposed to results list