relevanceai.api.batch.batch_retrieve

Batch Retrieve

Module Contents

Classes

BatchRetrieveClient

API Client

Attributes

BYTE_TO_MB

LIST_SIZE_MULTIPLIER

relevanceai.api.batch.batch_retrieve.BYTE_TO_MB
relevanceai.api.batch.batch_retrieve.LIST_SIZE_MULTIPLIER = 3
class relevanceai.api.batch.batch_retrieve.BatchRetrieveClient(project: str, api_key: str)

Bases: relevanceai.api.endpoints.client.APIClient, relevanceai.api.batch.chunk.Chunker

API Client

get_documents(self, dataset_id: str, number_of_documents: int = 20, filters: list = [], cursor: str = None, batch_size: int = 1000, sort: list = [], select_fields: list = [], include_vector: bool = True)

Retrieve documents with filters. Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

If you are looking to combine your filters with multiple ORs, simply add the following inside the query {“strict”:”must_or”}. :param dataset_id: Unique name of dataset :type dataset_id: string :param number_of_documents: Number of documents to retrieve :type number_of_documents: int :param select_fields: Fields to include in the search results, empty array/list means all fields. :type select_fields: list :param cursor: Cursor to paginate the document retrieval :type cursor: string :param batch_size: Number of documents to retrieve per iteration :type batch_size: int :param include_vector: Include vectors in the search results :type include_vector: bool :param sort: Fields to sort by. For each field, sort by descending or ascending. If you are using descending by datetime, it will get the most recent ones. :type sort: list :param filters: Query for filtering the search results :type filters: list

get_all_documents(self, dataset_id: str, chunk_size: int = 1000, filters: List = [], sort: List = [], select_fields: List = [], include_vector: bool = True, show_progress_bar: bool = True)

Retrieve all documents with filters. Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched. For more details see documents.get_where.

Example

>>> client = Client()
>>> client.get_all_documents(dataset_id="sample_dataset"")
Parameters
  • dataset_id (string) – Unique name of dataset

  • chunk_size (list) – Number of documents to retrieve per retrieval

  • include_vector (bool) – Include vectors in the search results

  • sort (list) – Fields to sort by. For each field, sort by descending or ascending. If you are using descending by datetime, it will get the most recent ones.

  • filters (list) – Query for filtering the search results

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields.

get_number_of_documents(self, dataset_id, filters=[])

Get number of documents in a dataset. Filter can be used to select documents that match the conditions set in a filter query. For more details see documents.get_where.

Parameters
  • dataset_ids (list) – Unique names of datasets

  • filters (list) – Filters to select documents

get_vector_fields(self, dataset_id)

Returns list of valid vector fields in dataset :param dataset_id: Unique name of dataset :type dataset_id: string