Python Utilities

Here is a list of the top features of the SDK.

Inserting with automatic multi-processing and multi-threading

Get multi-threading and multi-processing out of the box. The RelevanceAI Python package automatically gives you multi-threading and multi-processing out of the box!

client.insert_documents(
    dataset_id="match-example",
    docs=participant_frames,
    update_schema=True,
    overwrite=True,
)

def bulk_fn(docs):
    # bulk_fn receives a list of documents (python dictionaries)
    for d in docs:
        d["value_update"] = d["value"] + 2
    return docs

client.insert_documents(
    dataset_id="match-example",
    docs=participant_frames,
    update_schema=True,
    overwrite=True,
    bulk_fn=bulk_fn)

Under the hood, we use multiprocessing for processing the bulk_fn and multi-threading to send data via network requests. However, if there is no bulk_fn supplied, it only multi-threads network requests.

Pull Update Push

Update documents within your collection based on a rule customised by you. The Pull-Update-Push Function loops through every document in your collection, brings it to your local computer where a function is applied (specified by you) and reuploaded to either an new collection or updated in the same collection. There is a logging functionality to keep track of which documents have been updated to save on network requests.

For example, consider a scenario where you have uploaded a dataset called ‘test_dataset’ containing integers up to 200.

An example of sample data looks like this:

[{"_id": "0"}, {"_id": "1"}, ... {"_id": "199"}]

def even_function(data):
    for i in data:
        if int(i['_id']) % 2 == 0:
            i['even'] = True
        else:
            i['even'] = False
    return data

This function is then included in the Pull-Update-Push Function to update every document in the uploaded collection.

client.pull_update_push(original_collection, even_function)

Alternatively, a new collection could be specified to direct where updated documents are uploaded into.

[{"_id": "0", "even": true}, {"_id": "1", "even": false}, ... {"_id": "199", "even": true}]

client.delete_all_logs(original_collection)

Integration With VectorHub

VectorHub is RelevanceAI’s main vectorizer repository. For the models used here, we have abstracted away a lot of complexity from installation to encoding and have innate RelevanceAI support.

Using VectorHub models is as simple as (actual example):

# Insert in a dataframe
import pandas as pd
df = pd.read_csv("Grid view.csv")
df['_id'] = df['sample']
client.insert_df("sample-cn", df)

# !pip install vectorhub[encoders-text-sentence-transformers]
from vectorhub.encoders.text.sentence_transformers import SentenceTransformer2Vec
model = SentenceTransformer2Vec()

# Define an update function
def encode_documents(docs):
    # Field and then the docs go here
    return model.encode_documents(
        ["current", "Longer"], docs)

client.pull_update_push("sample-cn", encode_documents)