Vector Tools

class relevanceai.vector_tools.client.VectorTools(project, api_key)

Bases: relevanceai.base._Base

Vector Tools Client

api_key: str
config: relevanceai.config.Config
critical: Callable
debug: Callable
error: Callable
info: Callable
project: str
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.CentroidCluster

Bases: relevanceai.vector_tools.cluster.ClusterBase

critical: Callable
debug: Callable
error: Callable
abstract fit_transform(vectors)
abstract get_centers()

Get centers for the centroid-based clusters

Return type

Union[ndarray, List[list]]

get_centroid_docs(centroid_vector_field_name='centroid_vector_')

Get the centroid documents to store. if single vector field returns this:

{

“_id”: “document-id-1”, “centroid_vector_”: [0.23, 0.24, 0.23]

}

If multiple vector fields returns this: Returns multiple ``` {

“_id”: “document-id-1”, “blue_vector_”: [0.12, 0.312, 0.42], “red_vector_”: [0.23, 0.41, 0.3]

Return type

List

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.Cluster(project, api_key)

Bases: relevanceai.vector_tools.cluster_evaluate.ClusterEvaluate, relevanceai.api.client.BatchAPIClient, relevanceai.vector_tools.cluster.ClusterBase

api_key: str
static cluster(vectors, cluster, cluster_args={}, k=None)

Cluster vectors

Return type

ndarray

config: relevanceai.config.Config
critical: Callable
debug: Callable
error: Callable
hdbscan_cluster(dataset_id, vector_fields, filters=[], algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10, alias='hdbscan', cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False)

This function performs all the steps required for hdbscan clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection

Parameters
  • dataset_id (string) – name of the dataser

  • vector_fields (list) – a list containing the vector field to be used for clustering

  • filters (list) – a list to filter documents of the dataset

  • algorithm (str) – hdbscan configuration parameter default to “best”

  • alpha (float) – hdbscan configuration parameter default to 1.0

  • approx_min_span_tree (bool) – hdbscan configuration parameter default to True

  • gen_min_span_tree (bool) – hdbscan configuration parameter default to False

  • leaf_size (int) – hdbscan configuration parameter default to 40

  • Memory(cachedir=None) (memory =) – hdbscan configuration parameter on memory management

  • metric (str = "euclidean") – hdbscan configuration parameter default to “euclidean”

  • None (p =) – hdbscan configuration parameter default to None

  • None – hdbscan configuration parameter default to None

  • min_cluster_size (Optional[int]) – minimum cluster size, 10 by default

  • alias (string) – “hdbscan”, string to be used in naming of the field showing the clustering results

  • cluster_field (string) – “_cluster_”, string to name the main cluster field

  • overwrite (bool) – False by default, To overwite an existing clusering result

Example

>>> client.vector_tools.cluster.hdbscan_cluster(
    dataset_id="sample_dataset",
    vector_fields=["sample_1_vector_"] # Only 1 vector field is supported for now
)
info: Callable
kmeans_cluster(dataset_id, vector_fields, filters=[], k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=True, random_state=None, copy_x=True, algorithm='auto', alias=None, cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False, page_size=1)

This function performs all the steps required for Kmeans clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection

Parameters
  • dataset_id (string) – name of the dataser

  • vector_fields (list) – a list containing the vector field to be used for clustering

  • filters (list) – a list to filter documents of the dataset,

  • k (int) – K in Kmeans

  • init (string) – “k-means++” -> Kmeans algorithm parameter

  • n_init (int) – number of reinitialization for the kmeans algorithm

  • max_iter (int) – max iteration in the kmeans algorithm

  • tol (int) – tol in the kmeans algorithm

  • verbose (bool) – True by default

  • None (random_state =) – None by default -> Kmeans algorithm parameter

  • copy_x (bool) – True bydefault

  • algorithm (string) – “auto” by default

  • alias (string) – “kmeans”, string to be used in naming of the field showing the clustering results

  • cluster_field (string) – “_cluster_”, string to name the main cluster field

  • overwrite (bool) – False by default, To overwite an existing clusering result

Example

>>> client.vector_tools.cluster.kmeans_cluster(
    dataset_id="sample_dataset",
    vector_fields=vector_fields
)
project: str
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.ClusterBase

Bases: relevanceai.logger.LoguruLogger, doc_utils.doc_utils.DocUtils

critical: Callable
debug: Callable
error: Callable
fit_documents(vector_fields, docs, alias='default', cluster_field='_cluster_', return_only_clusters=True, inplace=True)

Train clustering algorithm on documents and then store the labels inside the documents.

Parameters
  • vector_field (list) – The vector field of the documents

  • docs (list) – List of documents to run clustering on

  • alias (str) – What the clusters can be called

  • cluster_field (str) – What the cluster fields should be called

  • return_only_clusters (bool) – If True, return only clusters, otherwise returns the original document

  • inplace (bool) – If True, the documents are edited inplace otherwise, a copy is made first

  • kwargs (dict) – Any other keyword argument will go directly into the clustering algorithm

abstract fit_transform(vectors)
info: Callable
property metadata
success: Callable
to_metadata()

You can also store the metadata of this clustering algorithm

warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.DensityCluster

Bases: relevanceai.vector_tools.cluster.ClusterBase

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors)
info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.HDBSCANClusterer(algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10)

Bases: relevanceai.vector_tools.cluster.DensityCluster

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.KMeans(k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto')

Bases: relevanceai.vector_tools.cluster.MiniBatchKMeans

critical: Callable
debug: Callable
error: Callable
info: Callable
success: Callable
to_metadata()

Editing the metadata of the function

warn: Callable
warning: Callable
class relevanceai.vector_tools.cluster.MiniBatchKMeans(k=10, init='k-means++', verbose=False, compute_labels=True, max_no_improvement=2)

Bases: relevanceai.vector_tools.cluster.CentroidCluster

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors)

Fit and transform transform the vectors

get_centers()

Returns centroids of clusters

info: Callable
success: Callable
to_metadata()

Editing the metadata of the function

warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.DimReduction(project, api_key)

Bases: relevanceai.base._Base, relevanceai.vector_tools.dim_reduction.DimReductionBase

api_key: str
config: relevanceai.config.Config
critical: Callable
debug: Callable
static dim_reduce(vectors, dr, dr_args, dims)

Dimensionality reduction

Return type

ndarray

error: Callable
info: Callable
project: str
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.DimReductionBase

Bases: relevanceai.logger.LoguruLogger

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors, dr_args, dims)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.Ivis

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.PCA

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors, dr_args={'random_state': 42, 'svd_solver': 'auto'}, dims=3)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.TSNE

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable
class relevanceai.vector_tools.dim_reduction.UMAP

Bases: relevanceai.vector_tools.dim_reduction.DimReductionBase

critical: Callable
debug: Callable
error: Callable
fit_transform(vectors, dr_args={'min_dist': 0.1, 'n_neighbors': 10, 'random_state': 42, 'transform_seed': 42}, dims=3)
Return type

ndarray

info: Callable
success: Callable
warn: Callable
warning: Callable