Core Endpoints
All admin-related tasks.
- class relevanceai.api.endpoints.admin.AdminClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- copy_foreign_dataset(dataset_id, source_dataset_id, source_project, source_api_key, project=None, api_key=None)
Copy a dataset from another user’s projects into your project.
Example
>>> client = Client() >>> client.admin.send_dataset( dataset_id="research", receiver_project="...", receiver_api_key="..." )
- Parameters
dataset_id (
str) – The dataset to copysource_dataset_id (
str) – The original datasetsource_project (
str) – The original project to copy fromsource_api_key (
str) – The original API key of the projectproject (
Optional[str]) – The original projectapi_key (
Optional[str]) – The original API key
- project: str
- request_read_api_key(read_username)
Creates a read only key for your project. Make sure to save the api key somewhere safe. When doing a search the admin username should still be used.
- Parameters
read_username (
str) – Read-only project
- send_dataset(dataset_id, receiver_project, receiver_api_key)
Send an individual a dataset.
Example
>>> client = Client() >>> client.admin.send_dataset( dataset_id="research", receiver_project="...", receiver_api_key="..." )
- Parameters
dataset_id (str) – The name of the dataset
receiver_project (str) – The project name that will receive the dataset
receiver_api_key (str) – The project API key that will receive the dataset
- class relevanceai.api.endpoints.aggregate.AggregateClient(project, api_key)
Bases:
relevanceai.base._BaseAggregate service
- aggregate(dataset_id, metrics=[], groupby=[], filters=[], page_size=20, page=1, asc=False, flatten=True, alias='default')
Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:
>>> { >>> "groupby" : [ >>> {"name": <alias>, "field": <field in the collection>, "agg": "category"}, >>> {"name": <alias>, "field": <another groupby field in the collection>, "agg": "numeric"} >>> ], >>> "metrics" : [ >>> {"name": <alias>, "field": <numeric field in the collection>, "agg": "avg"} >>> {"name": <alias>, "field": <another numeric field in the collection>, "agg": "max"} >>> ] >>> } >>> For example, one can use the following aggregations to group score based on region and player name. >>> { >>> "groupby" : [ >>> {"name": "region", "field": "player_region", "agg": "category"}, >>> {"name": "player_name", "field": "name", "agg": "category"} >>> ], >>> "metrics" : [ >>> {"name": "average_score", "field": "final_score", "agg": "avg"}, >>> {"name": "max_score", "field": "final_score", "agg": "max"}, >>> {'name':'total_score','field':"final_score", 'agg':'sum'}, >>> {'name':'average_deaths','field':"final_deaths", 'agg':'avg'}, >>> {'name':'highest_deaths','field':"final_deaths", 'agg':'max'}, >>> ] >>> }
“groupby” is the fields you want to split the data into. These are the available groupby types:
category : groupby a field that is a category
numeric: groupby a field that is a numeric
“metrics” is the fields and metrics you want to calculate in each of those, every aggregation includes a frequency metric. These are the available metric types:
“avg”, “max”, “min”, “sum”, “cardinality”
The response returned has the following in descending order.
If you want to return documents, specify a “group_size” parameter and a “select_fields” parameter if you want to limit the specific fields chosen. This looks as such:
>>> { >>> 'groupby':[ >>> {'name':'Manufacturer','field':'manufacturer','agg':'category', >>> 'group_size': 10, 'select_fields': ["name"]}, >>> ], >>> 'metrics':[ >>> {'name':'Price Average','field':'price','agg':'avg'}, >>> ], >>> } >>> >>> {"title": {"title": "books", "frequency": 200, "documents": [{...}, {...}]}, {"title": "books", "frequency": 100, "documents": [{...}, {...}]}}
For array-aggregations, you can add “agg”: “array” into the aggregation query.
- Parameters
dataset_id (string) – Unique name of dataset
metrics (list) – Fields and metrics you want to calculate
groupby (list) – Fields you want to split the data into
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
alias (string) – Alias used to name a vector field. Belongs in field_{alias} vector
- api_key: str
- config: relevanceai.config.Config
- project: str
- class relevanceai.api.endpoints.centroids.CentroidsClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- delete(dataset_id, vector_fields, alias='default')
Delete centroids by dataset ID, vector field and alias
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
- docs_closest_to_center(dataset_id, vector_fields, cluster_ids=[], alias='default', centroid_vector_fields=['centroid_vector_'], select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)
List of documents closest from the centre.
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
cluster_ids (lsit) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
centroid_vector_fields (list) – Vector fields stored
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results
- docs_furthest_from_center(dataset_id, vector_fields, cluster_ids=[], alias='default', select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)
List of documents furthest from the centre.
- Parameters
dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
cluster_ids (list) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results
- documents(dataset_id, cluster_ids, vector_fields, alias='default', page_size=5, cursor=None, page=1, include_vector=False, similarity_metric='cosine')
Retrieve the cluster centroids by IDs
- Parameters
dataset_id (string) – Unique name of dataset
cluster_ids (list) – List of cluster IDs
vector_fields (list) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
page (int) – Page of the results
include_vector (bool) – Include vectors in the search results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
- get(dataset_id, cluster_ids, vector_fields, alias='default', page_size=5, cursor=None)
Retrieve the cluster centroids by IDs
- Parameters
dataset_id (string) – Unique name of dataset
cluster_ids (list) – List of cluster IDs
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
- insert(dataset_id, cluster_centers, vector_fields, alias='default')
Insert your own cluster centroids for it to be used in approximate search settings and cluster aggregations. :type dataset_id:
str:param dataset_id: Unique name of dataset :type dataset_id: string :type cluster_centers:List:param cluster_centers: Cluster centers with the key being the index number :type cluster_centers: list :param vector_field: The vector field where a clustering task was run. :type vector_field: string :type alias:str:param alias: Alias is used to name a cluster :type alias: string
- list(dataset_id, vector_fields, alias='default', page_size=5, cursor=None, include_vector=False, base_url='https://gateway-api-aueast.relevance.ai/latest')
Retrieve the cluster centroid
- Parameters
dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
include_vector (bool) – Include vectors in the search results
- list_closest_to_center(dataset_id, vector_fields, cluster_ids=[], alias='default', centroid_vector_fields=['centroid_vector_'], select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)
List of documents closest from the centre.
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
cluster_ids (lsit) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
centroid_vector_fields (list) – Vector fields stored
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results
- list_furthest_from_center(dataset_id, vector_fields, cluster_ids=[], alias='default', select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)
List of documents furthest from the centre.
- Parameters
dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
cluster_ids (list) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results
- metadata(dataset_id, vector_fields, alias='default', metadata=None)
If metadata is none, retrieves metadata about a dataset. notably description, data source, etc Otherwise, you can store the metadata about your cluster here.
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
metadata (Optional[dict]) – If None, it will retrieve the metadata, otherwise it will overwrite the metadata of the cluster
- project: str
- update(dataset_id, vector_fields, id, update={}, alias='default')
Delete centroids by dataset ID, vector field and alias
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (List) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
id (string) – The centroid ID
update (dict) – The update to be applied to the document
API Client
- class relevanceai.api.endpoints.client.APIClient(project, api_key)
Bases:
relevanceai.base._BaseAPI Client
- api_key: str
- config: relevanceai.config.Config
- project: str
- relevanceai.api.endpoints.client.str2bool(v)
- class relevanceai.api.endpoints.cluster.ClusterClient(project, api_key)
Bases:
relevanceai.base._Base- aggregate(dataset_id, vector_fields, metrics=[], groupby=[], filters=[], page_size=20, page=1, asc=False, flatten=True, alias='default')
Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them. It can only can be used after a vector field has been clustered.
For more information about aggregations check out services.aggregate.aggregate.
- Parameters
dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field that was clustered on
metrics (list) – Fields and metrics you want to calculate
groupby (list) – Fields you want to split the data into
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
alias (string) – Alias used to name a vector field. Belongs in field_{alias}vector
- api_key: str
- config: relevanceai.config.Config
- facets(dataset_id, facets_fields=[], page_size=20, page=1, asc=False, date_interval='monthly')
Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered.
- Parameters
dataset_id (string) – Unique name of dataset
facets_fields (list) – Fields to include in the facets, if [] then all
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
date_interval (string) – Interval for date facets
- project: str
All Dataset related functions
- class relevanceai.api.endpoints.datasets.DatasetsClient(project, api_key)
Bases:
relevanceai.base._BaseAll dataset-related functions
- api_key: str
- bulk_insert(dataset_id, documents, insert_date=True, overwrite=True, update_schema=True, field_transformers=[], return_documents=False)
Documentation can be found here: https://ingest-api-dev-aueast.relevance.ai/latest/documentation#operation/InsertEncode
When inserting the document you can optionally specify your own id for a document by using the field name “_id”, if not specified a random id is assigned.
When inserting or specifying vectors in a document use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”.
When inserting or specifying chunks in a document the suffix (ends with) “_chunk_” for the field name. e.g. “products_chunk_”.
When inserting or specifying chunk vectors in a document’s chunks use the suffix (ends with) “_chunkvector_” for the field name. e.g. “products_chunk_.product_description_chunkvector_”.
Try to keep each batch of documents to insert under 200mb to avoid the insert timing out.
- Parameters
dataset_id (string) – Unique name of dataset
documents (list) – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
overwrite (bool) – Whether to overwrite document if it exists.
update_schema (bool) – Whether the api should check the documents for vector datatype to update the schema.
include_inserted_ids (bool) – Include the inserted IDs in the response
field_transformers (list) –
An example field_transformers object:
>>> { >>> "field": "string", >>> "output_field": "string", >>> "remove_html": true, >>> "split_sentences": true >>> }
- check_missing_ids(dataset_id, ids)
Look up in bulk if the ids exists in the dataset, returns all the missing one as a list.
- Parameters
dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents
- clone(old_dataset, new_dataset, schema={}, rename_fields={}, remove_fields=[], filters=[])
Clone a dataset into a new dataset. You can use this to rename fields and change data schemas. This is considered a project job.
- Parameters
old_dataset (string) – Unique name of old dataset to copy from
new_dataset (string) – Unique name of new dataset to copy to
schema (dict) – Schema for specifying the field that are vectors and its length
rename_fields (dict) – Fields to rename {‘old_field’: ‘new_field’}. Defaults to no renames
remove_fields (list) – Fields to remove [‘random_field’, ‘another_random_field’]. Defaults to no removes
filters (list) – Query for filtering the search results
- config: relevanceai.config.Config
- create(dataset_id, schema={})
A dataset can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch). A powerful and core feature of VecDB is that you can store both your metadata and vectors in the same document. When specifying the schema of a dataset and inserting your own vector use the suffix (ends with) “_vector_” for the field name, and specify the length of the vector in dataset_schema.
For example:
>>> { >>> "product_image_vector_": 1024, >>> "product_text_description_vector_" : 128 >>> }
These are the field types supported in our datasets: [“text”, “numeric”, “date”, “dict”, “chunks”, “vector”, “chunkvector”].
For example:
>>> { >>> "product_text_description" : "text", >>> "price" : "numeric", >>> "created_date" : "date", >>> "product_texts_chunk_": "chunks", >>> "product_text_chunkvector_" : 1024 >>> }
You don’t have to specify the schema of every single field when creating a dataset, as VecDB will automatically detect the appropriate data type for each field (vectors will be automatically identified by its “_vector_” suffix). Infact you also don’t always have to use this endpoint to create a dataset as /datasets/bulk_insert will infer and create the dataset and schema as you insert new documents.
Note
A dataset name/id can only contain undercase letters, dash, underscore and numbers.
“_id” is reserved as the key and id of a document.
Once a schema is set for a dataset it cannot be altered. If it has to be altered, utlise the copy dataset endpoint.
For more information about vectors check out the ‘Vectorizing’ section, services.search.vector or out blog at https://relevance.ai/blog. For more information about chunks and chunk vectors check out services.search.chunk.
- Parameters
dataset_id (string) – Unique name of dataset
schema (dict) – Schema for specifying the field that are vectors and its length
- delete(dataset_id, confirm=False)
Delete a dataset
- Parameters
dataset_id (string) – Unique name of dataset
- facets(dataset_id, fields=[], date_interval='monthly', page_size=5, page=1, asc=False)
Takes a high level aggregation of every field, return their unique values and frequencies. This is used to help create the filter bar for search.
- Parameters
dataset_id (string) – Unique name of dataset
fields (list) – Fields to include in the facets, if [] then all
date_interval (str) – Interval for date facets
page_size (int) – Size of facet page
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
- insert(dataset_id, document, insert_date=True, overwrite=True, update_schema=True)
Insert a single documents
When inserting the document you can optionally specify your own id for a document by using the field name “_id”, if not specified a random id is assigned.
When inserting or specifying vectors in a document use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”.
When inserting or specifying chunks in a document the suffix (ends with) “_chunk_” for the field name. e.g. “products_chunk_”.
When inserting or specifying chunk vectors in a document’s chunks use the suffix (ends with) “_chunkvector_” for the field name. e.g. “products_chunk_.product_description_chunkvector_”.
Documentation can be found here: https://ingest-api-dev-aueast.relevance.ai/latest/documentation#operation/InsertEncode
Try to keep each batch of documents to insert under 200mb to avoid the insert timing out.
- Parameters
dataset_id (string) – Unique name of dataset
documents (list) – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
overwrite (bool) – Whether to overwrite document if it exists.
update_schema (bool) – Whether the api should check the documents for vector datatype to update the schema.
- list()
List all datasets in a project that you are authorized to read/write.
- list_all(include_schema=True, include_stats=True, include_metadata=True, include_schema_stats=False, include_vector_health=False, include_active_jobs=False, dataset_ids=[], sort_by_created_at_date=False, asc=False, page_size=20, page=1)
Returns a page of datasets and in detail the dataset’s associated information that you are authorized to read/write. The information includes:
Schema - Data schema of a dataset (same as dataset.schema).
Metadata - Metadata of a dataset (same as dataset.metadata).
Stats - Statistics of number of documents and size of a dataset (same as dataset.stats).
Vector_health - Number of zero vectors stored (same as dataset.health).
Schema_stats - Fields and number of documents missing/not missing for that field (same as dataset.stats).
Active_jobs - All active jobs/tasks on the dataset.
- Parameters
include_schema (bool) – Whether to return schema
include_stats (bool) – Whether to return stats
include_metadata (bool) – Whether to return metadata
include_vector_health (bool) – Whether to return vector_health
include_schema_stats (bool) – Whether to return schema_stats
include_active_jobs (bool) – Whether to return active_jobs
dataset_ids (list) – List of dataset IDs
sort_by_created_at_date (bool) – Sort by created at date. By default shows the newest datasets. Set asc=False to get oldest dataset.
asc (bool) – Whether to sort results by ascending or descending order
page_size (int) – Size of each page of results
page (int) – Page of the results
- metadata(dataset_id)
Retreives metadata about a dataset. Notably description, data source, etc
- Parameters
dataset_id (string) – Unique name of dataset
- project: str
- schema(dataset_id)
Returns the schema of a dataset. Refer to datasets.create for different field types available in a VecDB schema.
- Parameters
dataset_id (string) – Unique name of dataset
- search(query, sort_by_created_at_date=False, asc=False)
Search datasets by their names with a traditional keyword search.
- Parameters
query (string) – Any string that belongs to part of a dataset.
sort_by_created_at_date (bool) – Sort by created at date. By default shows the newest datasets. Set asc=False to get oldest dataset.
asc (bool) – Whether to sort results by ascending or descending order
- task_status(dataset_id, task_id)
Check the status of an existing encoding task on the given dataset.
The required task_id was returned in the original encoding request such as datasets.vectorize.
- Parameters
dataset_id (string) – Unique name of dataset
task_id (string) – The task ID of the earlier queued vectorize task
- vectorize(dataset_id, model_id, fields=[], filters=[], refresh=False, alias='default', chunksize=20, chunk_field=None)
Queue the encoding of a dataset using the method given by model_id.
- Parameters
dataset_id (string) – Unique name of dataset
model_id (string) – Model ID to use for vectorizing (encoding.)
fields (list) – Fields to remove [‘random_field’, ‘another_random_field’]. Defaults to no removes
filters (list) – Filters to run against
refresh (bool) – If True, re-runs encoding on whole dataset.
alias (string) – Alias used to name a vector field. Belongs in field_{alias}vector
chunksize (int) – Batch for each encoding. Change at your own risk.
chunk_field (string) – The chunk field. If the chunk field is specified, the field to be encoded should not include the chunk field.
- class relevanceai.api.endpoints.documents.DocumentsClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- bulk_delete(dataset_id, ids=[])
Delete a list of documents by their IDs.
- Parameters
dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents to delete
- bulk_get(dataset_id, ids, include_vector=True, select_fields=[])
Retrieve a document by its ID (“_id” field). This will retrieve the document faster than a filter applied on the “_id” field.
For single id lookup version of this request use datasets.documents.get.
- Parameters
dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents in the dataset.
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
- bulk_update(dataset_id, updates, insert_date=True, return_documents=False)
Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents.
- Parameters
dataset_id (string) – Unique name of dataset
updates (list) – Updates to make to the documents. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
include_updated_ids (bool) – Include the inserted IDs in the response
- config: relevanceai.config.Config
- delete(dataset_id, id)
Delete a document by ID.
For deleting multiple documents refer to datasets.documents.bulk_delete
- Parameters
dataset_id (string) – Unique name of dataset
id (string) – ID of document to delete
- delete_fields(dataset_id, id, fields)
Delete fields in a document in a dataset by its id
- Parameters
dataset_id (string) – Unique name of dataset
id (string) – ID of a document in a dataset
fields (list) – List of fields to delete in a document
- delete_where(dataset_id, filters)
Delete a document by filters.
For more information about filters refer to datasets.documents.get_where.
- Parameters
dataset_id (string) – Unique name of dataset
filters (list) – Query for filtering the search results
- get(dataset_id, id, include_vector=True)
Retrieve a document by its ID (“_id” field). This will retrieve the document faster than a filter applied on the “_id” field.
- Parameters
dataset_id (string) – Unique name of dataset
id (string) – ID of a document in a dataset.
include_vector (bool) – Include vectors in the search results
- get_where(dataset_id, filters=[], cursor=None, page_size=20, sort=[], select_fields=[], include_vector=True, random_state=0, is_random=False)
Retrieve documents with filters. Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database. Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.
The filters query is a json body that follows the schema of:
>>> [ >>> {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"}, >>> {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90}, >>> ]
These are the available filter_type types: [“contains”, “category”, “categories”, “exists”, “date”, “numeric”, “ids”]
“contains”: for filtering documents that contains a string
>>> {'field' : 'item_brand', 'filter_type' : 'contains', "condition":"==", "condition_value": "samsu"}
“exact_match”/”category”: for filtering documents that matches a string or list of strings exactly.
>>> {'field' : 'item_brand', 'filter_type' : 'category', "condition":"==", "condition_value": "sumsung"}
“categories”: for filtering documents that contains any of a category from a list of categories.
>>> {'field' : 'item_category_tags', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]}
“exists”: for filtering documents that contains a field.
>>> {'field' : 'purchased', 'filter_type' : 'exists', "condition":"==", "condition_value":" "}
If you are looking to filter for documents where a field doesn’t exist, run this:
>>> {'field' : 'purchased', 'filter_type' : 'exists', "condition":"!=", "condition_value":" "}
“date”: for filtering date by date range.
>>> {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"}
“numeric”: for filtering by numeric range.
>>> {'field' : 'price', 'filter_type' : 'numeric', "condition":">=", "condition_value":90}
“ids”: for filtering by document ids.
>>> {'field' : 'ids', 'filter_type' : 'ids', "condition":"==", "condition_value":["1", "10"]}
These are the available conditions:
>>> "==", "!=", ">=", ">", "<", "<="
If you are looking to combine your filters with multiple ORs, simply add the following inside the query {“strict”:”must_or”}.
- Parameters
dataset_id (string) – Unique name of dataset
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
cursor (string) – Cursor to paginate the document retrieval
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
sort (list) – Fields to sort by. For each field, sort by descending or ascending. If you are using descending by datetime, it will get the most recent ones.
filters (list) – Query for filtering the search results
is_random (bool) – If True, retrieves doucments randomly. Cannot be used with cursor.
random_state (int) – Random Seed for retrieving random documents.
- list(dataset_id, select_fields=[], cursor=None, page_size=20, include_vector=True, random_state=0)
Retrieve documents from a specified dataset. Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the dataset.
- Parameters
dataset_id (string) – Unique name of dataset
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
include_vector (bool) – Include vectors in the search results
random_state (int) – Random Seed for retrieving random documents.
- paginate(dataset_id, page=1, page_size=20, include_vector=True, select_fields=[])
Retrieve documents with filters and support for pagination.
For more information about filters check out datasets.documents.get_where.
- Parameters
dataset_id (string) – Unique name of dataset
page (int) – Page of the results
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
- project: str
- update(dataset_id, update, insert_date=True)
Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents.
For update multiple documents refer to datasets.documents.bulk_update
- Parameters
dataset_id (string) – Unique name of dataset
update (list) – A dictionary to edit and add fields to a document. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
- update_where(dataset_id, update, filters=[])
Updates documents by filters. The updates to make to the documents that is returned by a filter.
For more information about filters refer to datasets.documents.get_where.
- Parameters
dataset_id (string) – Unique name of dataset
update (list) – A dictionary to edit and add fields to a document. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
filters (list) – Query for filtering the search results
- class relevanceai.api.endpoints.encoders.EncodersClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- image(image)
Encode an image
- Parameters
image (string) – URL of image to encode
- imagetext(image)
Encode an image to make searchable with text
- Parameters
image (string) – URL of image to encode
- multi_text(text)
Encode multilingual text
- Parameters
text (string) – Text to encode
- project: str
- text(text)
Encode text
- Parameters
text (string) – Text to encode
- textimage(text)
Encode text to make searchable with images
- Parameters
text (string) – Text to encode
All Dataset related functions
- class relevanceai.api.endpoints.monitor.MonitorClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- health(dataset_id)
Gives you a summary of the health of your vectors, e.g. how many documents with vectors are missing, how many documents with zero vectors
- Parameters
dataset_id (string) – Unique name of dataset
- project: str
- stats(dataset_id)
All operations related to monitoring
- Parameters
dataset_id (string) – Unique name of dataset
- usage(dataset_id, filters=[], page_size=20, page=1, asc=False, flatten=True, log_ids=[])
Aggregate the logs for a dataset.
The response returned has the following fields:
>>> [{'frequency': 958, 'insert_date': 1630159200000},...]
- Parameters
dataset_id (string) – Unique name of dataset
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
log_ids (list) – The log dataset IDs to aggregate with - one or more of logs, logs-write, logs-search, logs-task or js-logs
Prediction services
- class relevanceai.api.endpoints.prediction.PredictionClient(project, api_key)
Bases:
relevanceai.base._Base- KNN(dataset_id, vector, vector_field, target_field, k=5, weighting=True, impute_value=0, predict_operation='most_frequent', include_search_results=True)
Predict using KNN regression.
- Parameters
dataset_id (string) – Unique name of dataset
vector (list) – Vector, a list/array of floats that represents a piece of data.
vector_field (string) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
target_field (string) – The field to perform regression on.
k (int) – The number of results for KNN.
weighting (bool/list) – The weighting for each prediction
impute_value (int) – The value used to fill in the document when the data is missing.
predict_operation (string) – How to predict using the vectors. One of most_frequent or `sum_scores
include_search_results (bool) – Whether to include search results.
- KNN_from_results(field, results, impute_value=0, predict_operation='most_frequent')
Predict using KNN regression from search results
- Parameters
field (string) – Field in results to use for the prediction. Can be multiplied with weighting.
results (dict) – List of results in a dictionary
weighting (bool/list) – The weighting for each prediction
impute_value (int) – The value used to fill in the document when the data is missing.
predict_operation (string) – How to predict using the vectors. One of most_frequent or `sum_scores
- api_key: str
- config: relevanceai.config.Config
- project: str
Recommmend services.
- class relevanceai.api.endpoints.recommend.RecommendClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- diversity(dataset_id, cluster_vector_field, n_clusters, positive_document_ids={}, negative_document_ids={}, vector_fields=[], approximation_depth=0, vector_operation='sum', sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, n_init=5, n_iter=10, return_as_clusters=False)
Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.
Diversity recommendation increases the variety within the recommendations via clustering. Search results are clustered and the top k items in each cluster are selected. The main clustering parameters are cluster_vector_field and n_clusters, the vector field on which to perform clustering and number of clusters respectively.
Here are a couple of different scenarios and what the queries would look like for those:
Recommendations Personalized by single liked product:
>>> positive_document_ids=['A']
-> Document ID A Vector = Search Query
Recommendations Personalized by multiple liked product:
>>> positive_document_ids=['A', 'B']
-> Document ID A Vector + Document ID B Vector = Search Query
Recommendations Personalized by multiple liked product and disliked products:
>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']
-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query
Recommendations Personalized by multiple liked product and disliked products with weights:
>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}
-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query
You can change the operator between vectors with vector_operation:
e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’
-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query
- Parameters
dataset_id (string) – Unique name of dataset
cluster_vector_field (str) – The field to cluster on.
n_clusters (int) – Number of clusters to be specified.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (str) – Search history ID, only used for storing search histories.
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run
return_as_clusters (bool) – If True, return as clusters as opposed to results list
- project: str
- vector(dataset_id, positive_document_ids={}, negative_document_ids={}, vector_fields=[], approximation_depth=0, vector_operation='sum', sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False)
Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.
Here are a couple of different scenarios and what the queries would look like for those:
Recommendations Personalized by single liked product:
>>> positive_document_ids=['A']
-> Document ID A Vector = Search Query
Recommendations Personalized by multiple liked product:
>>> positive_document_ids=['A', 'B']
-> Document ID A Vector + Document ID B Vector = Search Query
Recommendations Personalized by multiple liked product and disliked products:
>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']
-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query
Recommendations Personalized by multiple liked product and disliked products with weights:
>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}
-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query
You can change the operator between vectors with vector_operation:
e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’
-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query
- Parameters
dataset_id (string) – Unique name of dataset
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
- class relevanceai.api.endpoints.search.SearchClient(project, api_key)
Bases:
relevanceai.base._Base- advanced_chunk(dataset_ids, chunk_search_query, min_score=None, page_size=20, include_vector=False, select_fields=[], query=None)
A more advanced chunk search to be able to combine vector search and chunk search in many different ways.
Example 1 (Hybrid chunk search): >>> chunk_query = { >>> “chunk” : “some.test”, >>> “queries” : [ >>> {“vector” : vec1, “fields”: {”some.test.some_chunkvector_”:1}, >>> “traditional_query” : {“text”:”python”, “fields” : [“some.test.test_words”], “traditional_weight”: 0.3}, >>> “metric” : “cosine”}, >>> {“vector” : vec, “fields”: [”some.test.tt.some_other_chunkvector_”], >>> “traditional_query” : {“text”:”jumble”, “fields” : [“some.test.test_words”], “traditional_weight”: 0.3}, >>> “metric” : “cosine”}, >>> ] >>> }
Example 2 (combines normal vector search with chunk search): >>> chunk_query = { >>> “queries” : [ >>> { >>> “queries”: [ >>> { >>> “vector”: vec1, >>> “fields”: { >>> “some.test.some_chunkvector_”: 0.9 >>> }, >>> “traditional_query”: { >>> “text”: “python”, >>> “fields”: [ >>> “some.test.test_words” >>> ], >>> “traditional_weight”: 0.3 >>> }, >>> “metric”: “cosine” >>> } >>> ], >>> “chunk”: “some.test”, >>> }, >>> { >>> “vector” : vec, >>> “fields”: { >>> “.some_vector_” : 0.1}, >>> “metric” : “cosine” >>> }, >>> ] >>> }
- Parameters
dataset_id (string) – Unique name of dataset
chunk_search_query (list) – Advanced chunk query
min_score (int) – Minimum score for similarity metric
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
query (string) – What to store as the query name in the dashboard
- advanced_multistep_chunk(dataset_ids, first_step_query, first_step_text, first_step_fields, chunk_search_query, first_step_edit_distance=- 1, first_step_ignore_space=True, first_step_traditional_weight=0.075, first_step_approximation_depth=0, first_step_sum_fields=True, first_step_filters=[], first_step_page_size=50, include_count=True, min_score=0, page_size=20, include_vector=False, select_fields=[], query=None)
Performs a vector hybrid search and then an advanced chunk search. Chunk Search allows one to search through chunks inside a document. The major difference between chunk search and normal search in Vector AI is that it relies on the chunkvector field. Chunk Vector Search. Search with a multiple chunkvectors for the most similar documents. Chunk search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
Example 1 (Hybrid chunk search):
>>> chunk_query = { >>> "chunk" : "some.test", >>> "queries" : [ >>> {"vector" : vec1, "fields": {"some.test.some_chunkvector_":1}, >>> "traditional_query" : {"text":"python", "fields" : ["some.test.test_words"], "traditional_weight": 0.3}, >>> "metric" : "cosine"}, >>> {"vector" : vec, "fields": ["some.test.tt.some_other_chunkvector_"], >>> "traditional_query" : {"text":"jumble", "fields" : ["some.test.test_words"], "traditional_weight": 0.3}, >>> "metric" : "cosine"}, >>> ] >>> }
Example 2 (combines normal vector search with chunk search): >>> chunk_query = { >>> “queries” : [ >>> { >>> “queries”: [ >>> { >>> “vector”: vec1, >>> “fields”: { >>> “some.test.some_chunkvector_”: 0.9 >>> }, >>> “traditional_query”: { >>> “text”: “python”, >>> “fields”: [ >>> “some.test.test_words” >>> ], >>> “traditional_weight”: 0.3 >>> }, >>> “metric”: “cosine” >>> } >>> ], >>> “chunk”: “some.test”, >>> }, >>> { >>> “vector” : vec, >>> “fields”: { >>> “.some_vector_” : 0.1}, >>> “metric” : “cosine” >>> }, >>> ] >>> }
- Parameters
dataset_id (string) – Unique name of dataset
first_step_query (list) – First step query
first_step_text (string) – Text search query (not encoded as vector)
first_step_fields (list) – Text fields to search against
chunk_search_query (list) – Advanced chunk query
first_step_edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
first_step_ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
first_step_traditional_weight (int) – Multiplier of traditional search score. A value of 0.025~0.075 is the ideal range
first_step_approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
first_step_sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
first_step_filters (list) – Query for filtering the search results
first_step_page_size (int) – In the first search, you are more interested in the contents
include_count (bool) – Include the total count of results in the search results
min_score (int) – Minimum score for similarity metric
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
query (string) – What to store as the query name in the dashboard
- api_key: str
- chunk(dataset_id, multivector_query, chunk_field, chunk_scoring='max', chunk_page_size=3, chunk_page=1, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=None, include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, query=None)
Chunks are data that has been divided into different units. e.g. A paragraph is made of many sentence chunks, a sentence is made of many word chunks, an image frame in a video. By searching through chunks you can pinpoint more specifically where a match is occuring. When creating a chunk in your document use the suffix “chunk” and “chunkvector”. An example of a document with chunks:
>>> { >>> "_id" : "123", >>> "title" : "Lorem Ipsum Article", >>> "description" : "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.", >>> "description_vector_" : [1.1, 1.2, 1.3], >>> "description_sentence_chunk_" : [ >>> {"sentence_id" : 0, "sentence_chunkvector_" : [0.1, 0.2, 0.3], "sentence" : "Lorem Ipsum is simply dummy text of the printing and typesetting industry."}, >>> {"sentence_id" : 1, "sentence_chunkvector_" : [0.4, 0.5, 0.6], "sentence" : "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book."}, >>> {"sentence_id" : 2, "sentence_chunkvector_" : [0.7, 0.8, 0.9], "sentence" : "It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged."}, >>> ] >>> }
For combining chunk search with other search check out services.search.advanced_chunk.
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
chunk_field (string) – Field where the array of chunked documents are.
chunk_scoring (string) – Scoring method for determining for ranking between document chunks.
chunk_page_size (int) – Size of each page of chunk results
chunk_page (int) – Page of the chunk results
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
query (string) – What to store as the query name in the dashboard
- config: relevanceai.config.Config
- diversity(dataset_id, cluster_vector_field, n_clusters, multivector_query, positive_document_ids={}, negative_document_ids={}, vector_operation='sum', approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, n_init=5, n_iter=10, return_as_clusters=False, query=None)
This will first perform an advanced search and then cluster the top X (page_size) search results. Results are returned as such: Once you have the clusters:
>>> Cluster 0: [A, B, C] >>> Cluster 1: [D, E] >>> Cluster 2: [F, G] >>> Cluster 3: [H, I]
(Note, each cluster is ordered by highest to lowest search score.)
This intermediately returns:
>>> results_batch_1: [A, H, F, D] (ordered by highest search score) >>> results_batch_2: [G, E, B, I] (ordered by highest search score) >>> results_batch_3: [C]
This then returns the final results:
>>> results: [A, H, F, D, G, E, B, I, C]
- Parameters
dataset_id (string) – Unique name of dataset
cluster_vector_field (str) – The field to cluster on.
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (str) – Search history ID, only used for storing search histories.
n_clusters (int) – Number of clusters to be specified.
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run
return_as_clusters (bool) – If True, return as clusters as opposed to results list
query (string) – What to store as the query name in the dashboard
- hybrid(dataset_id, multivector_query, text, fields, edit_distance=- 1, ignore_spaces=True, traditional_weight=0.075, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None)
Combine the best of both traditional keyword faceted search with semantic vector search to create the best search possible.
For information on how to use vector search check out services.search.vector.
For information on how to use traditional keyword faceted search check out services.search.traditional.
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (float) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (string) – Search history ID, only used for storing search histories.
edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
traditional_weight (int) – Multiplier of traditional search score. A value of 0.025~0.075 is the ideal range
- make_suggestion()
- multistep_chunk(dataset_id, multivector_query, first_step_multivector_query, chunk_field, chunk_scoring='max', chunk_page_size=3, chunk_page=1, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=None, include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, first_step_page=1, first_step_page_size=20, query=None)
Multistep chunk search involves a vector search followed by chunk search, used to accelerate chunk searches or to identify context before delving into relevant chunks. e.g. Search against the paragraph vector first then sentence chunkvector after.
For more information about chunk search check out services.search.chunk.
For more information about vector search check out services.search.vector
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
chunk_field (string) – Field where the array of chunked documents are.
chunk_scoring (string) – Scoring method for determining for ranking between document chunks.
chunk_page_size (int) – Size of each page of chunk results
chunk_page (int) – Page of the chunk results
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
first_step_multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
first_step_page (int) – Page of the results
first_step_page_size (int) – Size of each page of results
query (string) – What to store as the query name in the dashboard
- project: str
- semantic(dataset_id, multivector_query, fields, text, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False)
A more automated hybrid search with a few extra things that automatically adjusts some of the key parameters for more automated and good out of the box results.
For information on how to configure semantic search check out services.search.hybrid.
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
- traditional(dataset_id, text, fields=[], edit_distance=- 1, ignore_spaces=True, page_size=29, page=1, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, search_history_id=None)
Traditional Faceted Keyword Search with edit distance/fuzzy matching.
For information on how to apply facets/filters check out datasets.documents.get_where.
For information on how to construct the facets section for your search bar check out datasets.facets.
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
page_size (int) – Size of each page of results
page (int) – Page of the results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
search_history_id (string) – Search history ID, only used for storing search histories.
- vector(dataset_id, multivector_query, positive_document_ids={}, negative_document_ids={}, vector_operation='sum', approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, query=None)
Allows you to leverage vector similarity search to create a semantic search engine. Powerful features of VecDB vector search:
Multivector search that allows you to search with multiple vectors and give each vector a different weight. e.g. Search with a product image vector and text description vector to find the most similar products by what it looks like and what its described to do. You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%
An example of a simple multivector query:
>>> [ >>> {"vector": [0.12, 0.23, 0.34], "fields": ["name_vector_"], "alias":"text"}, >>> {"vector": [0.45, 0.56, 0.67], "fields": ["image_vector_"], "alias":"image"}, >>> ]
An example of a weighted multivector query:
>>> [ >>> {"vector": [0.12, 0.23, 0.34], "fields": {"name_vector_":0.6}, "alias":"text"}, >>> {"vector": [0.45, 0.56, 0.67], "fields": {"image_vector_"0.4}, "alias":"image"}, >>> ]
An example of a weighted multivector query with multiple fields for each vector:
>>> [ >>> {"vector": [0.12, 0.23, 0.34], "fields": {"name_vector_":0.6, "description_vector_":0.3}, "alias":"text"}, >>> {"vector": [0.45, 0.56, 0.67], "fields": {"image_vector_"0.4}, "alias":"image"}, >>> ]
Utilise faceted search with vector search. For information on how to apply facets/filters check out datasets.documents.get_where
Sum Fields option to adjust whether you want multiple vectors to be combined in the scoring or compared in the scoring. e.g. image_vector_ + text_vector_ or image_vector_ vs text_vector_.
When sum_fields=True:
Multi-vector search allows you to obtain search scores by taking the sum of these scores.
TextSearchScore + ImageSearchScore = SearchScore
We then rank by the new SearchScore, so for searching 1000 documents there will be 1000 search scores and results
When sum_fields=False:
Multi vector search but not summing the score, instead including it in the comparison!
TextSearchScore = SearchScore1
ImagSearchScore = SearchScore2
We then rank by the 2 new SearchScore, so for searching 1000 documents there should be 2000 search scores and results.
Personalization with positive and negative document ids.
For more information about the positive and negative document ids to personalize check out services.recommend.vector
For more even more advanced configuration and customisation of vector search, reach out to us at dev@relevance.ai and learn about our new advanced_vector_search.
- Parameters
dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (string) – Search history ID, only used for storing search histories.
query (string) – What to store as the query name in the dashboard
Services class
- class relevanceai.api.endpoints.services.ServicesClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- document_diff(doc, docs_to_compare, difference_fields=[])
Find differences between documents
- Parameters
doc (dict) – Main document to compare other documents against.
docs_to_compare (list) – Other documents to compare against the main document.
difference_fields (list) – Fields to compare. Defaults to [], which compares all fields.
- project: str
Tagger services
- class relevanceai.api.endpoints.tagger.TaggerClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- diversity(data, tag_dataset_id, encoder, cluster_vector_field, n_clusters, tag_field=None, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', filters=[], min_score=0, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, include_score=False, n_init=5, n_iter=10)
Tagging and then clustering the tags and returning one from each cluster (starting from the closest tag)
- Parameters
data (string) – Image Url or text or any data suited for the encoder
tag_dataset_id (string) – Name of the dataset you want to tag
encoder (string) – Which encoder to use.
cluster_vector_field (str) – The field to cluster on.
n_clusters (int) – Number of clusters to be specified.
tag_field (string) – The field used to tag in a dataset. If None, automatically uses the one stated in the encoder.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_search_relevance (bool) – Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results
search_relevance_cutoff_aggressiveness (int) – How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)
asc (bool) – Whether to sort results by ascending or descending order
include_score (bool) – Whether to include score
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run
- project: str
- tag(data, tag_dataset_id, encoder, tag_field=None, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', filters=[], min_score=0, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, include_score=False)
Tag documents or vectors
- Parameters
data (string) – Image Url or text or any data suited for the encoder
tag_dataset_id (string) – Name of the dataset you want to tag
encoder (string) – Which encoder to use.
tag_field (string) – The field used to tag in a dataset. If None, automatically uses the one stated in the encoder.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_search_relevance (bool) – Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results
search_relevance_cutoff_aggressiveness (int) – How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)
asc (bool) – Whether to sort results by ascending or descending order
include_score (bool) – Whether to include score
Tasks Module
- class relevanceai.api.endpoints.tasks.TasksClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- create(dataset_id, task_name, task_parameters)
Tasks unlock the power of VecDb AI by adding a lot more new functionality with a flexible way of searching.
- Parameters
dataset_id (string) – Unique name of dataset
task_name (string) – Name of task to complete
task_parameters (dict) – Parameters of task to complete
- create_cluster_task(dataset_id, vector_field, n_clusters, alias='default', refresh=False, n_iter=10, n_init=5, status_checker=True, verbose=True, time_between_ping=10)
Start a task which creates clusters for a dataset based on a vector field :param dataset_id: Unique name of dataset :type dataset_id: string :type vector_field:
str:param vector_field: The field to cluster on. :type vector_field: string :type alias:str:param alias: Alias is used to name a cluster :type alias: string :type n_clusters:int:param n_clusters: Number of clusters to be specified. :type n_clusters: int :type n_iter:int:param n_iter: Number of iterations in each run :type n_iter: int :type n_init:int:param n_init: Number of runs to run with different centroid seeds :type n_init: int :type refresh:bool:param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool
- create_encode_categories_task(dataset_id, fields, status_checker=True, verbose=True, time_between_ping=10)
Within a collection encode the specified array field in every document into vectors.
For example, array that represents a movie’s categories: >>> document 1 array field: {“category” : [“sci-fi”, “thriller”, “comedy”]} >>> document 2 array field: {“category” : [“sci-fi”, “romance”, “drama”]} >>> -> <Encode the arrays to vectors> -> >>> | sci-fi | thriller | comedy | romance | drama | >>> |--------|———-|--------|———|-------| >>> | 1 | 1 | 1 | 0 | 0 | >>> | 1 | 0 | 0 | 1 | 1 | >>> document 1 array vector: {”movie_categories_vector_”: [1, 1, 1, 0, 0]} >>> document 2 array vector: {”movie_categories_vector_”: [1, 0, 0, 1, 1]}
- Parameters
dataset_id (string) – Unique name of dataset
fields (list) – The numeric fields to encode into vectors.
- create_encode_imagetext_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10)
Start a task which encodes an image field for text representation :type dataset_id:
str:param dataset_id: Unique name of dataset :type dataset_id: string :type field:str:param field: The field to encode :type field: string :type alias:str:param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh:bool:param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool
- create_encode_text_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10)
Start a task which encodes a text field :type dataset_id:
str:param dataset_id: Unique name of dataset :type dataset_id: string :type field:str:param field: The field to encode :type field: string :type alias:str:param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh:bool:param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool
- create_encode_textimage_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10)
Start a task which encodes a text field for image representation :type dataset_id:
str:param dataset_id: Unique name of dataset :type dataset_id: string :type field:str:param field: The field to encode :type field: string :type alias:str:param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh:bool:param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool
- create_numeric_encoder_task(dataset_id, fields, vector_name='_vector_', status_checker=True, verbose=True, time_between_ping=10)
Within a collection encode the specified dictionary field in every document into vectors.
For example: a dictionary that represents a person’s characteristics visiting a store: >>> document 1 field: {“person_characteristics” : {“height”:180, “age”:40, “weight”:70}} >>> document 2 field: {“person_characteristics” : {“age”:32, “purchases”:10, “visits”: 24}} >>> -> <Encode the dictionaries to vectors> -> >>> | height | age | weight | purchases | visits | >>> |--------|—–|--------|———–|--------| >>> | 180 | 40 | 70 | 0 | 0 | >>> | 0 | 32 | 0 | 10 | 24 | >>> document 1 dictionary vector: {”person_characteristics_vector_”: [180, 40, 70, 0, 0]} >>> document 2 dictionary vector: {”person_characteristics_vector_”: [0, 32, 0, 10, 24]} :type dataset_id:
str:param dataset_id: Unique name of dataset :type dataset_id: string :type fields:list:param fields: The numeric fields to encode into vectors. :type fields: list :type vector_name:str:param vector_name: The name of the vector field created :type vector_name: string
- list(dataset_id, show_active_only=True)
List and get a history of all the jobs and its job_id, parameters, start time, etc.
- Parameters
dataset_id (string) – Unique name of dataset
show_active_only (bool) – Whether to show active only
- project: str
- status(dataset_id, task_id)
Get status of a collection level job. Whether its starting, running, failed or finished.
- Parameters
dataset_id (string) – Unique name of dataset
task_id (string) – Unique name of task
Wordclouds services
- class relevanceai.api.endpoints.wordclouds.WordcloudsClient(project, api_key)
Bases:
relevanceai.base._Base- api_key: str
- config: relevanceai.config.Config
- project: str
- wordclouds(dataset_id, fields, n=2, most_common=5, page_size=20, select_fields=[], include_vector=False, filters=[], additional_stopwords=[])
Get frequency n-gram frequency counter from the wordcloud.
- Parameters
dataset_id (string) – Unique name of dataset
fields (list) – The field on which to build NGrams
n (int) – The number of words fo combine
most_common (int) – The most common number of n-gram terms
page_size (int) – Size of each page of results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
filters (list) – Query for filtering the search results
additional_stopwords (list) – Additional stopwords to add