Core Endpoints

All admin-related tasks.

class relevanceai.api.endpoints.admin.AdminClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

copy_foreign_dataset(dataset_id, source_dataset_id, source_project, source_api_key, project=None, api_key=None)

Copy a dataset from another user’s projects into your project.

Example

>>> client = Client()
>>> client.admin.send_dataset(
    dataset_id="research",
    receiver_project="...",
    receiver_api_key="..."
)

Parameters

dataset_id (str) – The dataset to copy
source_dataset_id (str) – The original dataset
source_project (str) – The original project to copy from
source_api_key (str) – The original API key of the project
project (Optional[str]) – The original project
api_key (Optional[str]) – The original API key

project: str

request_read_api_key(read_username)

Creates a read only key for your project. Make sure to save the api key somewhere safe. When doing a search the admin username should still be used.

Parameters: read_username (str) – Read-only project

send_dataset(dataset_id, receiver_project, receiver_api_key)

Send an individual a dataset.

Example

>>> client = Client()
>>> client.admin.send_dataset(
    dataset_id="research",
    receiver_project="...",
    receiver_api_key="..."
)

Parameters

dataset_id (str) – The name of the dataset
receiver_project (str) – The project name that will receive the dataset
receiver_api_key (str) – The project API key that will receive the dataset

class relevanceai.api.endpoints.aggregate.AggregateClient(project, api_key)

Bases: relevanceai.base._Base

Aggregate service

aggregate(dataset_id, metrics=[], groupby=[], filters=[], page_size=20, page=1, asc=False, flatten=True, alias='default')

Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:

>>> {
>>>        "groupby" : [
>>>            {"name": <alias>, "field": <field in the collection>, "agg": "category"},
>>>            {"name": <alias>, "field": <another groupby field in the collection>, "agg": "numeric"}
>>>        ],
>>>        "metrics" : [
>>>            {"name": <alias>, "field": <numeric field in the collection>, "agg": "avg"}
>>>            {"name": <alias>, "field": <another numeric field in the collection>, "agg": "max"}
>>>        ]
>>>    }
>>>    For example, one can use the following aggregations to group score based on region and player name.
>>>    {
>>>        "groupby" : [
>>>            {"name": "region", "field": "player_region", "agg": "category"},
>>>            {"name": "player_name", "field": "name", "agg": "category"}
>>>        ],
>>>        "metrics" : [
>>>            {"name": "average_score", "field": "final_score", "agg": "avg"},
>>>            {"name": "max_score", "field": "final_score", "agg": "max"},
>>>            {'name':'total_score','field':"final_score", 'agg':'sum'},
>>>            {'name':'average_deaths','field':"final_deaths", 'agg':'avg'},
>>>            {'name':'highest_deaths','field':"final_deaths", 'agg':'max'},
>>>        ]
>>>    }

“groupby” is the fields you want to split the data into. These are the available groupby types:

category : groupby a field that is a category

numeric: groupby a field that is a numeric

“metrics” is the fields and metrics you want to calculate in each of those, every aggregation includes a frequency metric. These are the available metric types:

“avg”, “max”, “min”, “sum”, “cardinality”

The response returned has the following in descending order.

If you want to return documents, specify a “group_size” parameter and a “select_fields” parameter if you want to limit the specific fields chosen. This looks as such:

>>>    {
>>>    'groupby':[
>>>        {'name':'Manufacturer','field':'manufacturer','agg':'category',
>>>        'group_size': 10, 'select_fields': ["name"]},
>>>    ],
>>>    'metrics':[
>>>        {'name':'Price Average','field':'price','agg':'avg'},
>>>    ],
>>>    }
>>>
>>>    {"title": {"title": "books", "frequency": 200, "documents": [{...}, {...}]}, {"title": "books", "frequency": 100, "documents": [{...}, {...}]}}

For array-aggregations, you can add “agg”: “array” into the aggregation query.

Parameters

dataset_id (string) – Unique name of dataset
metrics (list) – Fields and metrics you want to calculate
groupby (list) – Fields you want to split the data into
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
alias (string) – Alias used to name a vector field. Belongs in field_{alias} vector

api_key: str

config: relevanceai.config.Config

project: str

class relevanceai.api.endpoints.centroids.CentroidsClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

delete(dataset_id, vector_fields, alias='default')

Delete centroids by dataset ID, vector field and alias

Parameters

dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster

docs_closest_to_center(dataset_id, vector_fields, cluster_ids=[], alias='default', centroid_vector_fields=['centroid_vector_'], select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)

List of documents closest from the centre.

Parameters

dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
cluster_ids (lsit) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
centroid_vector_fields (list) – Vector fields stored
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results

docs_furthest_from_center(dataset_id, vector_fields, cluster_ids=[], alias='default', select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)

List of documents furthest from the centre.

Parameters

dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
cluster_ids (list) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results

documents(dataset_id, cluster_ids, vector_fields, alias='default', page_size=5, cursor=None, page=1, include_vector=False, similarity_metric='cosine')

Retrieve the cluster centroids by IDs

Parameters

dataset_id (string) – Unique name of dataset
cluster_ids (list) – List of cluster IDs
vector_fields (list) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
page (int) – Page of the results
include_vector (bool) – Include vectors in the search results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

get(dataset_id, cluster_ids, vector_fields, alias='default', page_size=5, cursor=None)

Retrieve the cluster centroids by IDs

Parameters

dataset_id (string) – Unique name of dataset
cluster_ids (list) – List of cluster IDs
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval

insert(dataset_id, cluster_centers, vector_fields, alias='default'): Insert your own cluster centroids for it to be used in approximate search settings and cluster aggregations. :type dataset_id: str :param dataset_id: Unique name of dataset :type dataset_id: string :type cluster_centers: List :param cluster_centers: Cluster centers with the key being the index number :type cluster_centers: list :param vector_field: The vector field where a clustering task was run. :type vector_field: string :type alias: str :param alias: Alias is used to name a cluster :type alias: string

list(dataset_id, vector_fields, alias='default', page_size=5, cursor=None, include_vector=False, base_url='https://gateway-api-aueast.relevance.ai/latest')

Retrieve the cluster centroid

Parameters

dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
include_vector (bool) – Include vectors in the search results

list_closest_to_center(dataset_id, vector_fields, cluster_ids=[], alias='default', centroid_vector_fields=['centroid_vector_'], select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)

List of documents closest from the centre.

Parameters

dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
cluster_ids (lsit) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
centroid_vector_fields (list) – Vector fields stored
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results

list_furthest_from_center(dataset_id, vector_fields, cluster_ids=[], alias='default', select_fields=[], approx=0, sum_fields=True, page_size=1, page=1, similarity_metric='cosine', filters=[], facets=[], min_score=0, include_vector=False, include_count=True, include_facets=False)

List of documents furthest from the centre.

Parameters

dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field where a clustering task was run.
cluster_ids (list) – Any of the cluster ids
alias (string) – Alias is used to name a cluster
select_fields (list) – Fields to include in the search results, empty array/list means all fields
approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
facets (list) – Fields to include in the facets, if [] then all
min_score (int) – Minimum score for similarity metric
include_vectors (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
include_facets (bool) – Include facets in the search results

metadata(dataset_id, vector_fields, alias='default', metadata=None)

If metadata is none, retrieves metadata about a dataset. notably description, data source, etc Otherwise, you can store the metadata about your cluster here.

Parameters

dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
metadata (Optional[dict]) – If None, it will retrieve the metadata, otherwise it will overwrite the metadata of the cluster

project: str

update(dataset_id, vector_fields, id, update={}, alias='default')

Delete centroids by dataset ID, vector field and alias

Parameters

dataset_id (string) – Unique name of dataset
vector_field (List) – The vector field where a clustering task was run.
alias (string) – Alias is used to name a cluster
id (string) – The centroid ID
update (dict) – The update to be applied to the document

API Client

class relevanceai.api.endpoints.client.APIClient(project, api_key)

Bases: relevanceai.base._Base

API Client

api_key: str

config: relevanceai.config.Config

project: str

relevanceai.api.endpoints.client.str2bool(v)

class relevanceai.api.endpoints.cluster.ClusterClient(project, api_key)

Bases: relevanceai.base._Base

aggregate(dataset_id, vector_fields, metrics=[], groupby=[], filters=[], page_size=20, page=1, asc=False, flatten=True, alias='default')

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them. It can only can be used after a vector field has been clustered.

For more information about aggregations check out services.aggregate.aggregate.

Parameters

dataset_id (string) – Unique name of dataset
vector_fields (list) – The vector field that was clustered on
metrics (list) – Fields and metrics you want to calculate
groupby (list) – Fields you want to split the data into
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
alias (string) – Alias used to name a vector field. Belongs in field_{alias}vector

api_key: str

config: relevanceai.config.Config

facets(dataset_id, facets_fields=[], page_size=20, page=1, asc=False, date_interval='monthly')

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered.

Parameters

dataset_id (string) – Unique name of dataset
facets_fields (list) – Fields to include in the facets, if [] then all
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
date_interval (string) – Interval for date facets

project: str

All Dataset related functions

class relevanceai.api.endpoints.datasets.DatasetsClient(project, api_key)

Bases: relevanceai.base._Base

All dataset-related functions

api_key: str

bulk_insert(dataset_id, documents, insert_date=True, overwrite=True, update_schema=True, field_transformers=[], return_documents=False)

Documentation can be found here: https://ingest-api-dev-aueast.relevance.ai/latest/documentation#operation/InsertEncode

When inserting the document you can optionally specify your own id for a document by using the field name “_id”, if not specified a random id is assigned.
When inserting or specifying vectors in a document use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”.
When inserting or specifying chunks in a document the suffix (ends with) “_chunk_” for the field name. e.g. “products_chunk_”.
When inserting or specifying chunk vectors in a document’s chunks use the suffix (ends with) “_chunkvector_” for the field name. e.g. “products_chunk_.product_description_chunkvector_”.
Try to keep each batch of documents to insert under 200mb to avoid the insert timing out.

Parameters

dataset_id (string) – Unique name of dataset
documents (list) – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
overwrite (bool) – Whether to overwrite document if it exists.
update_schema (bool) – Whether the api should check the documents for vector datatype to update the schema.
include_inserted_ids (bool) – Include the inserted IDs in the response

field_transformers (list) –

An example field_transformers object:

>>> {
>>>    "field": "string",
>>>    "output_field": "string",
>>>    "remove_html": true,
>>>    "split_sentences": true
>>> }

check_missing_ids(dataset_id, ids)

Look up in bulk if the ids exists in the dataset, returns all the missing one as a list.

Parameters

dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents

clone(old_dataset, new_dataset, schema={}, rename_fields={}, remove_fields=[], filters=[])

Clone a dataset into a new dataset. You can use this to rename fields and change data schemas. This is considered a project job.

Parameters

old_dataset (string) – Unique name of old dataset to copy from
new_dataset (string) – Unique name of new dataset to copy to
schema (dict) – Schema for specifying the field that are vectors and its length
rename_fields (dict) – Fields to rename {‘old_field’: ‘new_field’}. Defaults to no renames
remove_fields (list) – Fields to remove [‘random_field’, ‘another_random_field’]. Defaults to no removes
filters (list) – Query for filtering the search results

config: relevanceai.config.Config

create(dataset_id, schema={})

A dataset can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch). A powerful and core feature of VecDB is that you can store both your metadata and vectors in the same document. When specifying the schema of a dataset and inserting your own vector use the suffix (ends with) “_vector_” for the field name, and specify the length of the vector in dataset_schema.

For example:

>>>    {
>>>        "product_image_vector_": 1024,
>>>        "product_text_description_vector_" : 128
>>>    }

These are the field types supported in our datasets: [“text”, “numeric”, “date”, “dict”, “chunks”, “vector”, “chunkvector”].

For example:

>>>    {
>>>        "product_text_description" : "text",
>>>        "price" : "numeric",
>>>        "created_date" : "date",
>>>        "product_texts_chunk_": "chunks",
>>>        "product_text_chunkvector_" : 1024
>>>    }

You don’t have to specify the schema of every single field when creating a dataset, as VecDB will automatically detect the appropriate data type for each field (vectors will be automatically identified by its “_vector_” suffix). Infact you also don’t always have to use this endpoint to create a dataset as /datasets/bulk_insert will infer and create the dataset and schema as you insert new documents.

Note

A dataset name/id can only contain undercase letters, dash, underscore and numbers.
“_id” is reserved as the key and id of a document.
Once a schema is set for a dataset it cannot be altered. If it has to be altered, utlise the copy dataset endpoint.

For more information about vectors check out the ‘Vectorizing’ section, services.search.vector or out blog at https://relevance.ai/blog. For more information about chunks and chunk vectors check out services.search.chunk.

Parameters

dataset_id (string) – Unique name of dataset
schema (dict) – Schema for specifying the field that are vectors and its length

delete(dataset_id, confirm=False)

Delete a dataset

Parameters: dataset_id (string) – Unique name of dataset

facets(dataset_id, fields=[], date_interval='monthly', page_size=5, page=1, asc=False)

Takes a high level aggregation of every field, return their unique values and frequencies. This is used to help create the filter bar for search.

Parameters

dataset_id (string) – Unique name of dataset
fields (list) – Fields to include in the facets, if [] then all
date_interval (str) – Interval for date facets
page_size (int) – Size of facet page
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order

insert(dataset_id, document, insert_date=True, overwrite=True, update_schema=True)

Insert a single documents

When inserting the document you can optionally specify your own id for a document by using the field name “_id”, if not specified a random id is assigned.
When inserting or specifying vectors in a document use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”.
When inserting or specifying chunks in a document the suffix (ends with) “_chunk_” for the field name. e.g. “products_chunk_”.
When inserting or specifying chunk vectors in a document’s chunks use the suffix (ends with) “_chunkvector_” for the field name. e.g. “products_chunk_.product_description_chunkvector_”.

Documentation can be found here: https://ingest-api-dev-aueast.relevance.ai/latest/documentation#operation/InsertEncode

Try to keep each batch of documents to insert under 200mb to avoid the insert timing out.

Parameters

dataset_id (string) – Unique name of dataset
documents (list) – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
overwrite (bool) – Whether to overwrite document if it exists.
update_schema (bool) – Whether the api should check the documents for vector datatype to update the schema.

list(): List all datasets in a project that you are authorized to read/write.

list_all(include_schema=True, include_stats=True, include_metadata=True, include_schema_stats=False, include_vector_health=False, include_active_jobs=False, dataset_ids=[], sort_by_created_at_date=False, asc=False, page_size=20, page=1)

Returns a page of datasets and in detail the dataset’s associated information that you are authorized to read/write. The information includes:

Schema - Data schema of a dataset (same as dataset.schema).
Metadata - Metadata of a dataset (same as dataset.metadata).
Stats - Statistics of number of documents and size of a dataset (same as dataset.stats).
Vector_health - Number of zero vectors stored (same as dataset.health).
Schema_stats - Fields and number of documents missing/not missing for that field (same as dataset.stats).
Active_jobs - All active jobs/tasks on the dataset.

Parameters

include_schema (bool) – Whether to return schema
include_stats (bool) – Whether to return stats
include_metadata (bool) – Whether to return metadata
include_vector_health (bool) – Whether to return vector_health
include_schema_stats (bool) – Whether to return schema_stats
include_active_jobs (bool) – Whether to return active_jobs
dataset_ids (list) – List of dataset IDs
sort_by_created_at_date (bool) – Sort by created at date. By default shows the newest datasets. Set asc=False to get oldest dataset.
asc (bool) – Whether to sort results by ascending or descending order
page_size (int) – Size of each page of results
page (int) – Page of the results

metadata(dataset_id)

Retreives metadata about a dataset. Notably description, data source, etc

Parameters: dataset_id (string) – Unique name of dataset

project: str

schema(dataset_id)

Returns the schema of a dataset. Refer to datasets.create for different field types available in a VecDB schema.

Parameters: dataset_id (string) – Unique name of dataset

search(query, sort_by_created_at_date=False, asc=False)

Search datasets by their names with a traditional keyword search.

Parameters

query (string) – Any string that belongs to part of a dataset.
sort_by_created_at_date (bool) – Sort by created at date. By default shows the newest datasets. Set asc=False to get oldest dataset.
asc (bool) – Whether to sort results by ascending or descending order

task_status(dataset_id, task_id)

Check the status of an existing encoding task on the given dataset.

The required task_id was returned in the original encoding request such as datasets.vectorize.

Parameters

dataset_id (string) – Unique name of dataset
task_id (string) – The task ID of the earlier queued vectorize task

vectorize(dataset_id, model_id, fields=[], filters=[], refresh=False, alias='default', chunksize=20, chunk_field=None)

Queue the encoding of a dataset using the method given by model_id.

Parameters

dataset_id (string) – Unique name of dataset
model_id (string) – Model ID to use for vectorizing (encoding.)
fields (list) – Fields to remove [‘random_field’, ‘another_random_field’]. Defaults to no removes
filters (list) – Filters to run against
refresh (bool) – If True, re-runs encoding on whole dataset.
alias (string) – Alias used to name a vector field. Belongs in field_{alias}vector
chunksize (int) – Batch for each encoding. Change at your own risk.
chunk_field (string) – The chunk field. If the chunk field is specified, the field to be encoded should not include the chunk field.

class relevanceai.api.endpoints.documents.DocumentsClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

bulk_delete(dataset_id, ids=[])

Delete a list of documents by their IDs.

Parameters

dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents to delete

bulk_get(dataset_id, ids, include_vector=True, select_fields=[])

Retrieve a document by its ID (“_id” field). This will retrieve the document faster than a filter applied on the “_id” field.

For single id lookup version of this request use datasets.documents.get.

Parameters

dataset_id (string) – Unique name of dataset
ids (list) – IDs of documents in the dataset.
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.

bulk_update(dataset_id, updates, insert_date=True, return_documents=False)

Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents.

Parameters

dataset_id (string) – Unique name of dataset
updates (list) – Updates to make to the documents. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.
include_updated_ids (bool) – Include the inserted IDs in the response

config: relevanceai.config.Config

delete(dataset_id, id)

Delete a document by ID.

For deleting multiple documents refer to datasets.documents.bulk_delete

Parameters

dataset_id (string) – Unique name of dataset
id (string) – ID of document to delete

delete_fields(dataset_id, id, fields)

Delete fields in a document in a dataset by its id

Parameters

dataset_id (string) – Unique name of dataset
id (string) – ID of a document in a dataset
fields (list) – List of fields to delete in a document

delete_where(dataset_id, filters)

Delete a document by filters.

For more information about filters refer to datasets.documents.get_where.

Parameters

dataset_id (string) – Unique name of dataset
filters (list) – Query for filtering the search results

get(dataset_id, id, include_vector=True)

Retrieve a document by its ID (“_id” field). This will retrieve the document faster than a filter applied on the “_id” field.

Parameters

dataset_id (string) – Unique name of dataset
id (string) – ID of a document in a dataset.
include_vector (bool) – Include vectors in the search results

get_where(dataset_id, filters=[], cursor=None, page_size=20, sort=[], select_fields=[], include_vector=True, random_state=0, is_random=False)

Retrieve documents with filters. Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database. Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

The filters query is a json body that follows the schema of:

>>> [
>>>    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"},
>>>    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90},
>>> ]

These are the available filter_type types: [“contains”, “category”, “categories”, “exists”, “date”, “numeric”, “ids”]

“contains”: for filtering documents that contains a string

>>> {'field' : 'item_brand', 'filter_type' : 'contains', "condition":"==", "condition_value": "samsu"}

“exact_match”/”category”: for filtering documents that matches a string or list of strings exactly.

>>> {'field' : 'item_brand', 'filter_type' : 'category', "condition":"==", "condition_value": "sumsung"}

“categories”: for filtering documents that contains any of a category from a list of categories.

>>> {'field' : 'item_category_tags', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]}

“exists”: for filtering documents that contains a field.

>>> {'field' : 'purchased', 'filter_type' : 'exists', "condition":"==", "condition_value":" "}

If you are looking to filter for documents where a field doesn’t exist, run this:

>>> {'field' : 'purchased', 'filter_type' : 'exists', "condition":"!=", "condition_value":" "}

“date”: for filtering date by date range.

>>> {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"}

“numeric”: for filtering by numeric range.

>>> {'field' : 'price', 'filter_type' : 'numeric', "condition":">=", "condition_value":90}

“ids”: for filtering by document ids.

>>> {'field' : 'ids', 'filter_type' : 'ids', "condition":"==", "condition_value":["1", "10"]}

These are the available conditions:

>>> "==", "!=", ">=", ">", "<", "<="

If you are looking to combine your filters with multiple ORs, simply add the following inside the query {“strict”:”must_or”}.

Parameters

dataset_id (string) – Unique name of dataset
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
cursor (string) – Cursor to paginate the document retrieval
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
sort (list) – Fields to sort by. For each field, sort by descending or ascending. If you are using descending by datetime, it will get the most recent ones.
filters (list) – Query for filtering the search results
is_random (bool) – If True, retrieves doucments randomly. Cannot be used with cursor.
random_state (int) – Random Seed for retrieving random documents.

list(dataset_id, select_fields=[], cursor=None, page_size=20, include_vector=True, random_state=0)

Retrieve documents from a specified dataset. Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the dataset.

Parameters

dataset_id (string) – Unique name of dataset
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
page_size (int) – Size of each page of results
cursor (string) – Cursor to paginate the document retrieval
include_vector (bool) – Include vectors in the search results
random_state (int) – Random Seed for retrieving random documents.

paginate(dataset_id, page=1, page_size=20, include_vector=True, select_fields=[])

Retrieve documents with filters and support for pagination.

For more information about filters check out datasets.documents.get_where.

Parameters

dataset_id (string) – Unique name of dataset
page (int) – Page of the results
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.

project: str

update(dataset_id, update, insert_date=True)

Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents.

For update multiple documents refer to datasets.documents.bulk_update

Parameters

dataset_id (string) – Unique name of dataset
update (list) – A dictionary to edit and add fields to a document. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
insert_date (bool) – Whether to include insert date as a field ‘insert_date_’.

update_where(dataset_id, update, filters=[])

Updates documents by filters. The updates to make to the documents that is returned by a filter.

For more information about filters refer to datasets.documents.get_where.

Parameters

dataset_id (string) – Unique name of dataset
update (list) – A dictionary to edit and add fields to a document. It should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}
filters (list) – Query for filtering the search results

class relevanceai.api.endpoints.encoders.EncodersClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

image(image)

Encode an image

Parameters: image (string) – URL of image to encode

imagetext(image)

Encode an image to make searchable with text

Parameters: image (string) – URL of image to encode

multi_text(text)

Encode multilingual text

Parameters: text (string) – Text to encode

project: str

text(text)

Encode text

Parameters: text (string) – Text to encode

textimage(text)

Encode text to make searchable with images

Parameters: text (string) – Text to encode

All Dataset related functions

class relevanceai.api.endpoints.monitor.MonitorClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

health(dataset_id)

Gives you a summary of the health of your vectors, e.g. how many documents with vectors are missing, how many documents with zero vectors

Parameters: dataset_id (string) – Unique name of dataset

project: str

stats(dataset_id)

All operations related to monitoring

Parameters: dataset_id (string) – Unique name of dataset

usage(dataset_id, filters=[], page_size=20, page=1, asc=False, flatten=True, log_ids=[])

Aggregate the logs for a dataset.

The response returned has the following fields:

>>> [{'frequency': 958, 'insert_date': 1630159200000},...]

Parameters

dataset_id (string) – Unique name of dataset
filters (list) – Query for filtering the search results
page_size (int) – Size of each page of results
page (int) – Page of the results
asc (bool) – Whether to sort results by ascending or descending order
flatten (bool) – Whether to flatten
log_ids (list) – The log dataset IDs to aggregate with - one or more of logs, logs-write, logs-search, logs-task or js-logs

Prediction services

class relevanceai.api.endpoints.prediction.PredictionClient(project, api_key)

Bases: relevanceai.base._Base

KNN(dataset_id, vector, vector_field, target_field, k=5, weighting=True, impute_value=0, predict_operation='most_frequent', include_search_results=True)

Predict using KNN regression.

Parameters

dataset_id (string) – Unique name of dataset
vector (list) – Vector, a list/array of floats that represents a piece of data.
vector_field (string) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
target_field (string) – The field to perform regression on.
k (int) – The number of results for KNN.
weighting (bool/list) – The weighting for each prediction
impute_value (int) – The value used to fill in the document when the data is missing.
predict_operation (string) – How to predict using the vectors. One of most_frequent or `sum_scores
include_search_results (bool) – Whether to include search results.

KNN_from_results(field, results, impute_value=0, predict_operation='most_frequent')

Predict using KNN regression from search results

Parameters

field (string) – Field in results to use for the prediction. Can be multiplied with weighting.
results (dict) – List of results in a dictionary
weighting (bool/list) – The weighting for each prediction
impute_value (int) – The value used to fill in the document when the data is missing.
predict_operation (string) – How to predict using the vectors. One of most_frequent or `sum_scores

api_key: str

config: relevanceai.config.Config

project: str

Recommmend services.

class relevanceai.api.endpoints.recommend.RecommendClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

diversity(dataset_id, cluster_vector_field, n_clusters, positive_document_ids={}, negative_document_ids={}, vector_fields=[], approximation_depth=0, vector_operation='sum', sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, n_init=5, n_iter=10, return_as_clusters=False)

Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.

Diversity recommendation increases the variety within the recommendations via clustering. Search results are clustered and the top k items in each cluster are selected. The main clustering parameters are cluster_vector_field and n_clusters, the vector field on which to perform clustering and number of clusters respectively.

Here are a couple of different scenarios and what the queries would look like for those:

Recommendations Personalized by single liked product:

>>> positive_document_ids=['A']

-> Document ID A Vector = Search Query

Recommendations Personalized by multiple liked product:

>>> positive_document_ids=['A', 'B']

-> Document ID A Vector + Document ID B Vector = Search Query

Recommendations Personalized by multiple liked product and disliked products:

>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']

-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query

Recommendations Personalized by multiple liked product and disliked products with weights:

>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}

-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query

You can change the operator between vectors with vector_operation:

e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’

-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query

Parameters

dataset_id (string) – Unique name of dataset
cluster_vector_field (str) – The field to cluster on.
n_clusters (int) – Number of clusters to be specified.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (str) – Search history ID, only used for storing search histories.
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run
return_as_clusters (bool) – If True, return as clusters as opposed to results list

project: str

vector(dataset_id, positive_document_ids={}, negative_document_ids={}, vector_fields=[], approximation_depth=0, vector_operation='sum', sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False)

Vector Search based recommendations are done by extracting the vectors of the documents ids specified performing some vector operations and then searching the dataset with the resultant vector. This allows us to not only do recommendations but personalized and weighted recommendations.

Here are a couple of different scenarios and what the queries would look like for those:

Recommendations Personalized by single liked product:

>>> positive_document_ids=['A']

-> Document ID A Vector = Search Query

Recommendations Personalized by multiple liked product:

>>> positive_document_ids=['A', 'B']

-> Document ID A Vector + Document ID B Vector = Search Query

Recommendations Personalized by multiple liked product and disliked products:

>>> positive_document_ids=['A', 'B'], negative_document_ids=['C', 'D']

-> (Document ID A Vector + Document ID B Vector) - (Document ID C Vector + Document ID C Vector) = Search Query

Recommendations Personalized by multiple liked product and disliked products with weights:

>>> positive_document_ids={'A':0.5, 'B':1}, negative_document_ids={'C':0.6, 'D':0.4}

-> (Document ID A Vector * 0.5 + Document ID B Vector * 1) - (Document ID C Vector * 0.6 + Document ID D Vector * 0.4) = Search Query

You can change the operator between vectors with vector_operation:

e.g. >>> positive_document_ids=[‘A’, ‘B’], negative_document_ids=[‘C’, ‘D’], vector_operation=’multiply’

-> (Document ID A Vector * Document ID B Vector) - (Document ID C Vector * Document ID D Vector) = Search Query

Parameters

dataset_id (string) – Unique name of dataset
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
vector_fields (list) – The vector field to search in. It can either be an array of strings (automatically equally weighted) (e.g. [’check_vector_’, ‘yellow_vector_’]) or it is a dictionary mapping field to float where the weighting is explicitly specified (e.g. {’check_vector_’: 0.2, ‘yellow_vector_’: 0.5})
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100

class relevanceai.api.endpoints.search.SearchClient(project, api_key)

Bases: relevanceai.base._Base

advanced_chunk(dataset_ids, chunk_search_query, min_score=None, page_size=20, include_vector=False, select_fields=[], query=None)

A more advanced chunk search to be able to combine vector search and chunk search in many different ways.

Example 1 (Hybrid chunk search): >>> chunk_query = { >>> “chunk” : “some.test”, >>> “queries” : [ >>> {“vector” : vec1, “fields”: {”some.test.some_chunkvector_”:1}, >>> “traditional_query” : {“text”:”python”, “fields” : [“some.test.test_words”], “traditional_weight”: 0.3}, >>> “metric” : “cosine”}, >>> {“vector” : vec, “fields”: [”some.test.tt.some_other_chunkvector_”], >>> “traditional_query” : {“text”:”jumble”, “fields” : [“some.test.test_words”], “traditional_weight”: 0.3}, >>> “metric” : “cosine”}, >>> ] >>> }

Example 2 (combines normal vector search with chunk search): >>> chunk_query = { >>> “queries” : [ >>> { >>> “queries”: [ >>> { >>> “vector”: vec1, >>> “fields”: { >>> “some.test.some_chunkvector_”: 0.9 >>> }, >>> “traditional_query”: { >>> “text”: “python”, >>> “fields”: [ >>> “some.test.test_words” >>> ], >>> “traditional_weight”: 0.3 >>> }, >>> “metric”: “cosine” >>> } >>> ], >>> “chunk”: “some.test”, >>> }, >>> { >>> “vector” : vec, >>> “fields”: { >>> “.some_vector_” : 0.1}, >>> “metric” : “cosine” >>> }, >>> ] >>> }

Parameters

dataset_id (string) – Unique name of dataset
chunk_search_query (list) – Advanced chunk query
min_score (int) – Minimum score for similarity metric
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
query (string) – What to store as the query name in the dashboard

advanced_multistep_chunk(dataset_ids, first_step_query, first_step_text, first_step_fields, chunk_search_query, first_step_edit_distance=- 1, first_step_ignore_space=True, first_step_traditional_weight=0.075, first_step_approximation_depth=0, first_step_sum_fields=True, first_step_filters=[], first_step_page_size=50, include_count=True, min_score=0, page_size=20, include_vector=False, select_fields=[], query=None)

Performs a vector hybrid search and then an advanced chunk search. Chunk Search allows one to search through chunks inside a document. The major difference between chunk search and normal search in Vector AI is that it relies on the chunkvector field. Chunk Vector Search. Search with a multiple chunkvectors for the most similar documents. Chunk search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Example 1 (Hybrid chunk search):

>>> chunk_query = {
>>>     "chunk" : "some.test",
>>>     "queries" : [
>>>         {"vector" : vec1, "fields": {"some.test.some_chunkvector_":1},
>>>         "traditional_query" : {"text":"python", "fields" : ["some.test.test_words"], "traditional_weight": 0.3},
>>>         "metric" : "cosine"},
>>>         {"vector" : vec, "fields": ["some.test.tt.some_other_chunkvector_"],
>>>         "traditional_query" : {"text":"jumble", "fields" : ["some.test.test_words"], "traditional_weight": 0.3},
>>>         "metric" : "cosine"},
>>>     ]
>>> }

Example 2 (combines normal vector search with chunk search): >>> chunk_query = { >>> “queries” : [ >>> { >>> “queries”: [ >>> { >>> “vector”: vec1, >>> “fields”: { >>> “some.test.some_chunkvector_”: 0.9 >>> }, >>> “traditional_query”: { >>> “text”: “python”, >>> “fields”: [ >>> “some.test.test_words” >>> ], >>> “traditional_weight”: 0.3 >>> }, >>> “metric”: “cosine” >>> } >>> ], >>> “chunk”: “some.test”, >>> }, >>> { >>> “vector” : vec, >>> “fields”: { >>> “.some_vector_” : 0.1}, >>> “metric” : “cosine” >>> }, >>> ] >>> }

Parameters

dataset_id (string) – Unique name of dataset
first_step_query (list) – First step query
first_step_text (string) – Text search query (not encoded as vector)
first_step_fields (list) – Text fields to search against
chunk_search_query (list) – Advanced chunk query
first_step_edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
first_step_ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
first_step_traditional_weight (int) – Multiplier of traditional search score. A value of 0.025~0.075 is the ideal range
first_step_approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
first_step_sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
first_step_filters (list) – Query for filtering the search results
first_step_page_size (int) – In the first search, you are more interested in the contents
include_count (bool) – Include the total count of results in the search results
min_score (int) – Minimum score for similarity metric
page_size (int) – Size of each page of results
include_vector (bool) – Include vectors in the search results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
query (string) – What to store as the query name in the dashboard

api_key: str

chunk(dataset_id, multivector_query, chunk_field, chunk_scoring='max', chunk_page_size=3, chunk_page=1, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=None, include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, query=None)

Chunks are data that has been divided into different units. e.g. A paragraph is made of many sentence chunks, a sentence is made of many word chunks, an image frame in a video. By searching through chunks you can pinpoint more specifically where a match is occuring. When creating a chunk in your document use the suffix “chunk” and “chunkvector”. An example of a document with chunks:

>>> {
>>>     "_id" : "123",
>>>     "title" : "Lorem Ipsum Article",
>>>     "description" : "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.",
>>>     "description_vector_" : [1.1, 1.2, 1.3],
>>>     "description_sentence_chunk_" : [
>>>         {"sentence_id" : 0, "sentence_chunkvector_" : [0.1, 0.2, 0.3], "sentence" : "Lorem Ipsum is simply dummy text of the printing and typesetting industry."},
>>>         {"sentence_id" : 1, "sentence_chunkvector_" : [0.4, 0.5, 0.6], "sentence" : "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book."},
>>>         {"sentence_id" : 2, "sentence_chunkvector_" : [0.7, 0.8, 0.9], "sentence" : "It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged."},
>>>     ]
>>> }

For combining chunk search with other search check out services.search.advanced_chunk.

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
chunk_field (string) – Field where the array of chunked documents are.
chunk_scoring (string) – Scoring method for determining for ranking between document chunks.
chunk_page_size (int) – Size of each page of chunk results
chunk_page (int) – Page of the chunk results
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
query (string) – What to store as the query name in the dashboard

config: relevanceai.config.Config

diversity(dataset_id, cluster_vector_field, n_clusters, multivector_query, positive_document_ids={}, negative_document_ids={}, vector_operation='sum', approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, n_init=5, n_iter=10, return_as_clusters=False, query=None)

This will first perform an advanced search and then cluster the top X (page_size) search results. Results are returned as such: Once you have the clusters:

>>> Cluster 0: [A, B, C]
>>> Cluster 1: [D, E]
>>> Cluster 2: [F, G]
>>> Cluster 3: [H, I]

(Note, each cluster is ordered by highest to lowest search score.)

This intermediately returns:

>>> results_batch_1: [A, H, F, D] (ordered by highest search score)
>>> results_batch_2: [G, E, B, I] (ordered by highest search score)
>>> results_batch_3: [C]

This then returns the final results:

>>> results: [A, H, F, D, G, E, B, I, C]

Parameters

dataset_id (string) – Unique name of dataset
cluster_vector_field (str) – The field to cluster on.
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (str) – Search history ID, only used for storing search histories.
n_clusters (int) – Number of clusters to be specified.
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run
return_as_clusters (bool) – If True, return as clusters as opposed to results list
query (string) – What to store as the query name in the dashboard

hybrid(dataset_id, multivector_query, text, fields, edit_distance=- 1, ignore_spaces=True, traditional_weight=0.075, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None)

Combine the best of both traditional keyword faceted search with semantic vector search to create the best search possible.

For information on how to use vector search check out services.search.vector.

For information on how to use traditional keyword faceted search check out services.search.traditional.

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (float) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (string) – Search history ID, only used for storing search histories.
edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
traditional_weight (int) – Multiplier of traditional search score. A value of 0.025~0.075 is the ideal range

make_suggestion()

multistep_chunk(dataset_id, multivector_query, first_step_multivector_query, chunk_field, chunk_scoring='max', chunk_page_size=3, chunk_page=1, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=None, include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, first_step_page=1, first_step_page_size=20, query=None)

Multistep chunk search involves a vector search followed by chunk search, used to accelerate chunk searches or to identify context before delving into relevant chunks. e.g. Search against the paragraph vector first then sentence chunkvector after.

For more information about chunk search check out services.search.chunk.

For more information about vector search check out services.search.vector

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
chunk_field (string) – Field where the array of chunked documents are.
chunk_scoring (string) – Scoring method for determining for ranking between document chunks.
chunk_page_size (int) – Size of each page of chunk results
chunk_page (int) – Page of the chunk results
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
first_step_multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
first_step_page (int) – Page of the results
first_step_page_size (int) – Size of each page of results
query (string) – What to store as the query name in the dashboard

project: str

semantic(dataset_id, multivector_query, fields, text, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False)

A more automated hybrid search with a few extra things that automatically adjusts some of the key parameters for more automated and good out of the box results.

For information on how to configure semantic search check out services.search.hybrid.

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100

traditional(dataset_id, text, fields=[], edit_distance=- 1, ignore_spaces=True, page_size=29, page=1, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, search_history_id=None)

Traditional Faceted Keyword Search with edit distance/fuzzy matching.

For information on how to apply facets/filters check out datasets.documents.get_where.

For information on how to construct the facets section for your search bar check out datasets.facets.

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
text (string) – Text Search Query (not encoded as vector)
fields (list) – Text fields to search against
edit_distance (int) – This refers to the amount of letters it takes to reach from 1 string to another string. e.g. band vs bant is a 1 word edit distance. Use -1 if you would like this to be automated.
ignore_spaces (bool) – Whether to consider cases when there is a space in the word. E.g. Go Pro vs GoPro.
page_size (int) – Size of each page of results
page (int) – Page of the results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
search_history_id (string) – Search history ID, only used for storing search histories.

vector(dataset_id, multivector_query, positive_document_ids={}, negative_document_ids={}, vector_operation='sum', approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', facets=[], filters=[], min_score=0, select_fields=[], include_vector=False, include_count=True, asc=False, keep_search_history=False, hundred_scale=False, search_history_id=None, query=None)

Allows you to leverage vector similarity search to create a semantic search engine. Powerful features of VecDB vector search:

Multivector search that allows you to search with multiple vectors and give each vector a different weight. e.g. Search with a product image vector and text description vector to find the most similar products by what it looks like and what its described to do. You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%

An example of a simple multivector query:

>>> [
>>>     {"vector": [0.12, 0.23, 0.34], "fields": ["name_vector_"], "alias":"text"},
>>>     {"vector": [0.45, 0.56, 0.67], "fields": ["image_vector_"], "alias":"image"},
>>> ]

An example of a weighted multivector query:

>>> [
>>>     {"vector": [0.12, 0.23, 0.34], "fields": {"name_vector_":0.6}, "alias":"text"},
>>>     {"vector": [0.45, 0.56, 0.67], "fields": {"image_vector_"0.4}, "alias":"image"},
>>> ]

An example of a weighted multivector query with multiple fields for each vector:

>>> [
>>>     {"vector": [0.12, 0.23, 0.34], "fields": {"name_vector_":0.6, "description_vector_":0.3}, "alias":"text"},
>>>     {"vector": [0.45, 0.56, 0.67], "fields": {"image_vector_"0.4}, "alias":"image"},
>>> ]

Utilise faceted search with vector search. For information on how to apply facets/filters check out datasets.documents.get_where
Sum Fields option to adjust whether you want multiple vectors to be combined in the scoring or compared in the scoring. e.g. image_vector_ + text_vector_ or image_vector_ vs text_vector_.
When sum_fields=True:
- Multi-vector search allows you to obtain search scores by taking the sum of these scores.
- TextSearchScore + ImageSearchScore = SearchScore
- We then rank by the new SearchScore, so for searching 1000 documents there will be 1000 search scores and results
When sum_fields=False:
- Multi vector search but not summing the score, instead including it in the comparison!
- TextSearchScore = SearchScore1
- ImagSearchScore = SearchScore2
- We then rank by the 2 new SearchScore, so for searching 1000 documents there should be 2000 search scores and results.
Personalization with positive and negative document ids.
- For more information about the positive and negative document ids to personalize check out services.recommend.vector

For more even more advanced configuration and customisation of vector search, reach out to us at dev@relevance.ai and learn about our new advanced_vector_search.

Parameters

dataset_id (string) – Unique name of dataset
multivector_query (list) – Query for advance search that allows for multiple vector and field querying.
positive_document_ids (dict) – Positive document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
negative_document_ids (dict) – Negative document IDs to personalize the results with, this will retrive the vectors from the document IDs and consider it in the operation.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
vector_operation (string) – Aggregation for the vectors when using positive and negative document IDs, choose from [‘mean’, ‘sum’, ‘min’, ‘max’, ‘divide’, ‘mulitple’]
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
facets (list) – Fields to include in the facets, if [] then all
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
include_count (bool) – Include the total count of results in the search results
asc (bool) – Whether to sort results by ascending or descending order
keep_search_history (bool) – Whether to store the history into VecDB. This will increase the storage costs over time.
hundred_scale (bool) – Whether to scale up the metric by 100
search_history_id (string) – Search history ID, only used for storing search histories.
query (string) – What to store as the query name in the dashboard

Services class

class relevanceai.api.endpoints.services.ServicesClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

document_diff(doc, docs_to_compare, difference_fields=[])

Find differences between documents

Parameters

doc (dict) – Main document to compare other documents against.
docs_to_compare (list) – Other documents to compare against the main document.
difference_fields (list) – Fields to compare. Defaults to [], which compares all fields.

project: str

Tagger services

class relevanceai.api.endpoints.tagger.TaggerClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

diversity(data, tag_dataset_id, encoder, cluster_vector_field, n_clusters, tag_field=None, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', filters=[], min_score=0, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, include_score=False, n_init=5, n_iter=10)

Tagging and then clustering the tags and returning one from each cluster (starting from the closest tag)

Parameters

data (string) – Image Url or text or any data suited for the encoder
tag_dataset_id (string) – Name of the dataset you want to tag
encoder (string) – Which encoder to use.
cluster_vector_field (str) – The field to cluster on.
n_clusters (int) – Number of clusters to be specified.
tag_field (string) – The field used to tag in a dataset. If None, automatically uses the one stated in the encoder.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_search_relevance (bool) – Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results
search_relevance_cutoff_aggressiveness (int) – How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)
asc (bool) – Whether to sort results by ascending or descending order
include_score (bool) – Whether to include score
n_init (int) – Number of runs to run with different centroid seeds
n_iter (int) – Number of iterations in each run

project: str

tag(data, tag_dataset_id, encoder, tag_field=None, approximation_depth=0, sum_fields=True, page_size=20, page=1, similarity_metric='cosine', filters=[], min_score=0, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, include_score=False)

Tag documents or vectors

Parameters

data (string) – Image Url or text or any data suited for the encoder
tag_dataset_id (string) – Name of the dataset you want to tag
encoder (string) – Which encoder to use.
tag_field (string) – The field used to tag in a dataset. If None, automatically uses the one stated in the encoder.
approximation_depth (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate.
sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size (int) – Size of each page of results
page (int) – Page of the results
similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters (list) – Query for filtering the search results
min_score (int) – Minimum score for similarity metric
include_search_relevance (bool) – Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results
search_relevance_cutoff_aggressiveness (int) – How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)
asc (bool) – Whether to sort results by ascending or descending order
include_score (bool) – Whether to include score

Tasks Module

class relevanceai.api.endpoints.tasks.TasksClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

create(dataset_id, task_name, task_parameters)

Tasks unlock the power of VecDb AI by adding a lot more new functionality with a flexible way of searching.

Parameters

dataset_id (string) – Unique name of dataset
task_name (string) – Name of task to complete
task_parameters (dict) – Parameters of task to complete

create_cluster_task(dataset_id, vector_field, n_clusters, alias='default', refresh=False, n_iter=10, n_init=5, status_checker=True, verbose=True, time_between_ping=10): Start a task which creates clusters for a dataset based on a vector field :param dataset_id: Unique name of dataset :type dataset_id: string :type vector_field: str :param vector_field: The field to cluster on. :type vector_field: string :type alias: str :param alias: Alias is used to name a cluster :type alias: string :type n_clusters: int :param n_clusters: Number of clusters to be specified. :type n_clusters: int :type n_iter: int :param n_iter: Number of iterations in each run :type n_iter: int :type n_init: int :param n_init: Number of runs to run with different centroid seeds :type n_init: int :type refresh: bool :param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool

create_encode_categories_task(dataset_id, fields, status_checker=True, verbose=True, time_between_ping=10)

Within a collection encode the specified array field in every document into vectors.

For example, array that represents a movie’s categories: >>> document 1 array field: {“category” : [“sci-fi”, “thriller”, “comedy”]} >>> document 2 array field: {“category” : [“sci-fi”, “romance”, “drama”]} >>> -> <Encode the arrays to vectors> -> >>> | sci-fi | thriller | comedy | romance | drama | >>> |--------|———-|--------|———|-------| >>> | 1 | 1 | 1 | 0 | 0 | >>> | 1 | 0 | 0 | 1 | 1 | >>> document 1 array vector: {”movie_categories_vector_”: [1, 1, 1, 0, 0]} >>> document 2 array vector: {”movie_categories_vector_”: [1, 0, 0, 1, 1]}

Parameters

dataset_id (string) – Unique name of dataset
fields (list) – The numeric fields to encode into vectors.

create_encode_imagetext_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10): Start a task which encodes an image field for text representation :type dataset_id: str :param dataset_id: Unique name of dataset :type dataset_id: string :type field: str :param field: The field to encode :type field: string :type alias: str :param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh: bool :param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool

create_encode_text_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10): Start a task which encodes a text field :type dataset_id: str :param dataset_id: Unique name of dataset :type dataset_id: string :type field: str :param field: The field to encode :type field: string :type alias: str :param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh: bool :param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool

create_encode_textimage_task(dataset_id, field, alias='default', refresh=False, status_checker=True, verbose=True, time_between_ping=10): Start a task which encodes a text field for image representation :type dataset_id: str :param dataset_id: Unique name of dataset :type dataset_id: string :type field: str :param field: The field to encode :type field: string :type alias: str :param alias: Alias used to name a vector field. Belongs in field_{alias}vector :type alias: string :type refresh: bool :param refresh: Whether to rerun task on the whole dataset or just the ones missing the output :type refresh: bool

create_numeric_encoder_task(dataset_id, fields, vector_name='_vector_', status_checker=True, verbose=True, time_between_ping=10)

Within a collection encode the specified dictionary field in every document into vectors.

For example: a dictionary that represents a person’s characteristics visiting a store: >>> document 1 field: {“person_characteristics” : {“height”:180, “age”:40, “weight”:70}} >>> document 2 field: {“person_characteristics” : {“age”:32, “purchases”:10, “visits”: 24}} >>> -> <Encode the dictionaries to vectors> -> >>> | height | age | weight | purchases | visits | >>> |--------|—–|--------|———–|--------| >>> | 180 | 40 | 70 | 0 | 0 | >>> | 0 | 32 | 0 | 10 | 24 | >>> document 1 dictionary vector: {”person_characteristics_vector_”: [180, 40, 70, 0, 0]} >>> document 2 dictionary vector: {”person_characteristics_vector_”: [0, 32, 0, 10, 24]} :type dataset_id: str :param dataset_id: Unique name of dataset :type dataset_id: string :type fields: list :param fields: The numeric fields to encode into vectors. :type fields: list :type vector_name: str :param vector_name: The name of the vector field created :type vector_name: string

list(dataset_id, show_active_only=True)

List and get a history of all the jobs and its job_id, parameters, start time, etc.

Parameters

dataset_id (string) – Unique name of dataset
show_active_only (bool) – Whether to show active only

project: str

status(dataset_id, task_id)

Get status of a collection level job. Whether its starting, running, failed or finished.

Parameters

dataset_id (string) – Unique name of dataset
task_id (string) – Unique name of task

Wordclouds services

class relevanceai.api.endpoints.wordclouds.WordcloudsClient(project, api_key)

Bases: relevanceai.base._Base

api_key: str

config: relevanceai.config.Config

project: str

wordclouds(dataset_id, fields, n=2, most_common=5, page_size=20, select_fields=[], include_vector=False, filters=[], additional_stopwords=[])

Get frequency n-gram frequency counter from the wordcloud.

Parameters

dataset_id (string) – Unique name of dataset
fields (list) – The field on which to build NGrams
n (int) – The number of words fo combine
most_common (int) – The most common number of n-gram terms
page_size (int) – Size of each page of results
select_fields (list) – Fields to include in the search results, empty array/list means all fields.
include_vector (bool) – Include vectors in the search results
filters (list) – Query for filtering the search results
additional_stopwords (list) – Additional stopwords to add

Core Endpoints

Module contents