Vector Tools

class relevanceai.vector_tools.client.VectorTools(project, api_key)

Bases: relevanceai.base._Base

Vector Tools Client

api_key: str

config: relevanceai.config.Config

critical: Callable

debug: Callable

error: Callable

info: Callable

project: str

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.CentroidCluster

Bases: relevanceai.vector_tools.cluster.ClusterBase

critical: Callable

debug: Callable

error: Callable

abstract fit_transform(vectors)

abstract get_centers()

Get centers for the centroid-based clusters

Return type: Union[ndarray, List[list]]

get_centroid_docs(centroid_vector_field_name='centroid_vector_')

Get the centroid documents to store. if single vector field returns this:

{
“_id”: “document-id-1”, “centroid_vector_”: [0.23, 0.24, 0.23]

}

If multiple vector fields returns this: Returns multiple ``` {

“_id”: “document-id-1”, “blue_vector_”: [0.12, 0.312, 0.42], “red_vector_”: [0.23, 0.41, 0.3]

Return type: List

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.Cluster(project, api_key)

Bases: relevanceai.vector_tools.cluster_evaluate.ClusterEvaluate, relevanceai.api.client.BatchAPIClient, relevanceai.vector_tools.cluster.ClusterBase

api_key: str

static cluster(vectors, cluster, cluster_args={}, k=None)

Cluster vectors

Return type: ndarray

config: relevanceai.config.Config

critical: Callable

debug: Callable

error: Callable

hdbscan_cluster(dataset_id, vector_fields, filters=[], algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10, alias='hdbscan', cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False)

This function performs all the steps required for hdbscan clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection

Parameters

dataset_id (string) – name of the dataser
vector_fields (list) – a list containing the vector field to be used for clustering
filters (list) – a list to filter documents of the dataset
algorithm (str) – hdbscan configuration parameter default to “best”
alpha (float) – hdbscan configuration parameter default to 1.0
approx_min_span_tree (bool) – hdbscan configuration parameter default to True
gen_min_span_tree (bool) – hdbscan configuration parameter default to False
leaf_size (int) – hdbscan configuration parameter default to 40
Memory(cachedir=None) (memory =) – hdbscan configuration parameter on memory management
metric (str = "euclidean") – hdbscan configuration parameter default to “euclidean”
None (p =) – hdbscan configuration parameter default to None
None – hdbscan configuration parameter default to None
min_cluster_size (Optional[int]) – minimum cluster size, 10 by default
alias (string) – “hdbscan”, string to be used in naming of the field showing the clustering results
cluster_field (string) – “_cluster_”, string to name the main cluster field
overwrite (bool) – False by default, To overwite an existing clusering result

Example

>>> client.vector_tools.cluster.hdbscan_cluster(
    dataset_id="sample_dataset",
    vector_fields=["sample_1_vector_"] # Only 1 vector field is supported for now
)

info: Callable

kmeans_cluster(dataset_id, vector_fields, filters=[], k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=True, random_state=None, copy_x=True, algorithm='auto', alias=None, cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False, page_size=1)

This function performs all the steps required for Kmeans clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection

Parameters

dataset_id (string) – name of the dataser
vector_fields (list) – a list containing the vector field to be used for clustering
filters (list) – a list to filter documents of the dataset,
k (int) – K in Kmeans
init (string) – “k-means++” -> Kmeans algorithm parameter
n_init (int) – number of reinitialization for the kmeans algorithm
max_iter (int) – max iteration in the kmeans algorithm
tol (int) – tol in the kmeans algorithm
verbose (bool) – True by default
None (random_state =) – None by default -> Kmeans algorithm parameter
copy_x (bool) – True bydefault
algorithm (string) – “auto” by default
alias (string) – “kmeans”, string to be used in naming of the field showing the clustering results
cluster_field (string) – “_cluster_”, string to name the main cluster field
overwrite (bool) – False by default, To overwite an existing clusering result

Example

>>> client.vector_tools.cluster.kmeans_cluster(
    dataset_id="sample_dataset",
    vector_fields=vector_fields
)

project: str

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.ClusterBase

Bases: relevanceai.logger.LoguruLogger, doc_utils.doc_utils.DocUtils

critical: Callable

debug: Callable

error: Callable

fit_documents(vector_fields, docs, alias='default', cluster_field='_cluster_', return_only_clusters=True, inplace=True)

Train clustering algorithm on documents and then store the labels inside the documents.

Parameters

vector_field (list) – The vector field of the documents
docs (list) – List of documents to run clustering on
alias (str) – What the clusters can be called
cluster_field (str) – What the cluster fields should be called
return_only_clusters (bool) – If True, return only clusters, otherwise returns the original document
inplace (bool) – If True, the documents are edited inplace otherwise, a copy is made first
kwargs (dict) – Any other keyword argument will go directly into the clustering algorithm

abstract fit_transform(vectors)

info: Callable

property metadata

success: Callable

to_metadata(): You can also store the metadata of this clustering algorithm

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.DensityCluster

Bases: relevanceai.vector_tools.cluster.ClusterBase

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors)

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.HDBSCANClusterer(algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10)

Bases: relevanceai.vector_tools.cluster.DensityCluster

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.KMeans(k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto')

Bases: relevanceai.vector_tools.cluster.MiniBatchKMeans

critical: Callable

debug: Callable

error: Callable

info: Callable

success: Callable

to_metadata(): Editing the metadata of the function

warn: Callable

warning: Callable

class relevanceai.vector_tools.cluster.MiniBatchKMeans(k=10, init='k-means++', verbose=False, compute_labels=True, max_no_improvement=2)

Bases: relevanceai.vector_tools.cluster.CentroidCluster

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors): Fit and transform transform the vectors

get_centers(): Returns centroids of clusters

info: Callable

success: Callable

to_metadata(): Editing the metadata of the function

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.DimReduction(project, api_key)

Bases: relevanceai.base._Base, relevanceai.vector_tools.dim_reduction.DimReductionBase

api_key: str

config: relevanceai.config.Config

critical: Callable

debug: Callable

static dim_reduce(vectors, dr, dr_args, dims)

Dimensionality reduction

Return type: ndarray

error: Callable

info: Callable

project: str

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.DimReductionBase

Bases: relevanceai.logger.LoguruLogger

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors, dr_args, dims)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.Ivis

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.PCA

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors, dr_args={'random_state': 42, 'svd_solver': 'auto'}, dims=3)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.TSNE

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable

class relevanceai.vector_tools.dim_reduction.UMAP

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable

debug: Callable

error: Callable

fit_transform(vectors, dr_args={'min_dist': 0.1, 'n_neighbors': 10, 'random_state': 42, 'transform_seed': 42}, dims=3)

Return type: ndarray

info: Callable

success: Callable

warn: Callable

warning: Callable