Vector Tools
- class relevanceai.vector_tools.client.VectorTools(project, api_key)
Bases:
relevanceai.base._BaseVector Tools Client
- api_key: str
- config: relevanceai.config.Config
- critical: Callable
- debug: Callable
- error: Callable
- info: Callable
- project: str
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.CentroidCluster
Bases:
relevanceai.vector_tools.cluster.ClusterBase- critical: Callable
- debug: Callable
- error: Callable
- abstract fit_transform(vectors)
- abstract get_centers()
Get centers for the centroid-based clusters
- Return type
Union[ndarray,List[list]]
- get_centroid_docs(centroid_vector_field_name='centroid_vector_')
Get the centroid documents to store. if single vector field returns this:
- {
“_id”: “document-id-1”, “centroid_vector_”: [0.23, 0.24, 0.23]
}
If multiple vector fields returns this: Returns multiple ``` {
“_id”: “document-id-1”, “blue_vector_”: [0.12, 0.312, 0.42], “red_vector_”: [0.23, 0.41, 0.3]
- Return type
List
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.Cluster(project, api_key)
Bases:
relevanceai.vector_tools.cluster_evaluate.ClusterEvaluate,relevanceai.api.client.BatchAPIClient,relevanceai.vector_tools.cluster.ClusterBase- api_key: str
- static cluster(vectors, cluster, cluster_args={}, k=None)
Cluster vectors
- Return type
ndarray
- config: relevanceai.config.Config
- critical: Callable
- debug: Callable
- error: Callable
- hdbscan_cluster(dataset_id, vector_fields, filters=[], algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10, alias='hdbscan', cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False)
This function performs all the steps required for hdbscan clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection
- Parameters
dataset_id (string) – name of the dataser
vector_fields (list) – a list containing the vector field to be used for clustering
filters (list) – a list to filter documents of the dataset
algorithm (str) – hdbscan configuration parameter default to “best”
alpha (float) – hdbscan configuration parameter default to 1.0
approx_min_span_tree (bool) – hdbscan configuration parameter default to True
gen_min_span_tree (bool) – hdbscan configuration parameter default to False
leaf_size (int) – hdbscan configuration parameter default to 40
Memory(cachedir=None) (memory =) – hdbscan configuration parameter on memory management
metric (str = "euclidean") – hdbscan configuration parameter default to “euclidean”
None (p =) – hdbscan configuration parameter default to None
None – hdbscan configuration parameter default to None
min_cluster_size (
Optional[int]) – minimum cluster size, 10 by defaultalias (string) – “hdbscan”, string to be used in naming of the field showing the clustering results
cluster_field (string) – “_cluster_”, string to name the main cluster field
overwrite (bool) – False by default, To overwite an existing clusering result
Example
>>> client.vector_tools.cluster.hdbscan_cluster( dataset_id="sample_dataset", vector_fields=["sample_1_vector_"] # Only 1 vector field is supported for now )
- info: Callable
- kmeans_cluster(dataset_id, vector_fields, filters=[], k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=True, random_state=None, copy_x=True, algorithm='auto', alias=None, cluster_field='_cluster_', update_documents_chunksize=50, overwrite=False, page_size=1)
This function performs all the steps required for Kmeans clustering: 1- Loads the data 2- Clusters the data 3- Updates the data with clustering info 4- Adds the centroid to the hidden centroid collection
- Parameters
dataset_id (string) – name of the dataser
vector_fields (list) – a list containing the vector field to be used for clustering
filters (list) – a list to filter documents of the dataset,
k (int) – K in Kmeans
init (string) – “k-means++” -> Kmeans algorithm parameter
n_init (int) – number of reinitialization for the kmeans algorithm
max_iter (int) – max iteration in the kmeans algorithm
tol (int) – tol in the kmeans algorithm
verbose (bool) – True by default
None (random_state =) – None by default -> Kmeans algorithm parameter
copy_x (bool) – True bydefault
algorithm (string) – “auto” by default
alias (string) – “kmeans”, string to be used in naming of the field showing the clustering results
cluster_field (string) – “_cluster_”, string to name the main cluster field
overwrite (bool) – False by default, To overwite an existing clusering result
Example
>>> client.vector_tools.cluster.kmeans_cluster( dataset_id="sample_dataset", vector_fields=vector_fields )
- project: str
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.ClusterBase
Bases:
relevanceai.logger.LoguruLogger,doc_utils.doc_utils.DocUtils- critical: Callable
- debug: Callable
- error: Callable
- fit_documents(vector_fields, docs, alias='default', cluster_field='_cluster_', return_only_clusters=True, inplace=True)
Train clustering algorithm on documents and then store the labels inside the documents.
- Parameters
vector_field (list) – The vector field of the documents
docs (list) – List of documents to run clustering on
alias (str) – What the clusters can be called
cluster_field (str) – What the cluster fields should be called
return_only_clusters (bool) – If True, return only clusters, otherwise returns the original document
inplace (bool) – If True, the documents are edited inplace otherwise, a copy is made first
kwargs (dict) – Any other keyword argument will go directly into the clustering algorithm
- abstract fit_transform(vectors)
- info: Callable
- property metadata
- success: Callable
- to_metadata()
You can also store the metadata of this clustering algorithm
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.DensityCluster
Bases:
relevanceai.vector_tools.cluster.ClusterBase- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors)
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.HDBSCANClusterer(algorithm='best', alpha=1.0, approx_min_span_tree=True, gen_min_span_tree=False, leaf_size=40, memory=Memory(location=None), metric='euclidean', min_samples=None, p=None, min_cluster_size=10)
Bases:
relevanceai.vector_tools.cluster.DensityCluster- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.KMeans(k=10, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto')
Bases:
relevanceai.vector_tools.cluster.MiniBatchKMeans- critical: Callable
- debug: Callable
- error: Callable
- info: Callable
- success: Callable
- to_metadata()
Editing the metadata of the function
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.cluster.MiniBatchKMeans(k=10, init='k-means++', verbose=False, compute_labels=True, max_no_improvement=2)
Bases:
relevanceai.vector_tools.cluster.CentroidCluster- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors)
Fit and transform transform the vectors
- get_centers()
Returns centroids of clusters
- info: Callable
- success: Callable
- to_metadata()
Editing the metadata of the function
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.DimReduction(project, api_key)
Bases:
relevanceai.base._Base,relevanceai.vector_tools.dim_reduction.DimReductionBase- api_key: str
- config: relevanceai.config.Config
- critical: Callable
- debug: Callable
- static dim_reduce(vectors, dr, dr_args, dims)
Dimensionality reduction
- Return type
ndarray
- error: Callable
- info: Callable
- project: str
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.DimReductionBase
Bases:
relevanceai.logger.LoguruLogger- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors, dr_args, dims)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.Ivis
Bases:
relevanceai.vector_tools.dim_reduction.DimReductionBase- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.PCA
Bases:
relevanceai.vector_tools.dim_reduction.DimReductionBase- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors, dr_args={'random_state': 42, 'svd_solver': 'auto'}, dims=3)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.TSNE
Bases:
relevanceai.vector_tools.dim_reduction.DimReductionBase- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors, dr_args={'init': 'pca', 'learning_rate': 100, 'n_iter': 500, 'perplexity': 30, 'random_state': 42}, dims=3)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable
- class relevanceai.vector_tools.dim_reduction.UMAP
Bases:
relevanceai.vector_tools.dim_reduction.DimReductionBase- critical: Callable
- debug: Callable
- error: Callable
- fit_transform(vectors, dr_args={'min_dist': 0.1, 'n_neighbors': 10, 'random_state': 42, 'transform_seed': 42}, dims=3)
- Return type
ndarray
- info: Callable
- success: Callable
- warn: Callable
- warning: Callable