relevanceai.vector_tools.cluster_evaluate
Module Contents
Classes
Batch API client |
Functions
|
Attributes
- relevanceai.vector_tools.cluster_evaluate.SILHOUETTE_INFO = Multiline-String
Show Value
1Good clusters have clusters which are highly seperated and elements within which are highly cohesive. <br/> 2<b>Silohuette Score</b> is a metric from <b>-1 to 1</b> that calculates the average cohesion and seperation of each element, with <b>1</b> being clustered perfectly, <b>0</b> being indifferent and <b>-1</b> being clustered the wrong way
- relevanceai.vector_tools.cluster_evaluate.RAND_INFO = Multiline-String
Show Value
1Good clusters have elements, which, when paired, belong to the same cluster label and same ground truth label. <br/> 2<b>Rand Index</b> is a metric from <b>0 to 1</b> that represents the percentage of element pairs that have a matching cluster and ground truth labels with <b>1</b> matching perfect and <b>0</b> matching randomly. <br/> <i>Note: This measure is adjusted for randomness so does not equal the exact numerical percentage.</i>
- relevanceai.vector_tools.cluster_evaluate.HOMOGENEITY_INFO = Multiline-String
Show Value
1Good clusters only have elements from the same ground truth within the same cluster<br/> 2<b>Homogeneity</b> is a metric from <b>0 to 1</b> that represents whether clusters contain only elements in the same ground truth with <b>1</b> being perfect and <b>0</b> being absolutely incorrect.
- relevanceai.vector_tools.cluster_evaluate.COMPLETENESS_INFO = Multiline-String
Show Value
1Good clusters have all elements from the same ground truth within the same cluster <br/> 2<b>Completeness</b> is a metric from <b>0 to 1</b> that represents whether clusters contain all elements in the same ground truth with <b>1</b> being perfect and <b>0</b> being absolutely incorrect.
- relevanceai.vector_tools.cluster_evaluate.METRIC_DESCRIPTION
- relevanceai.vector_tools.cluster_evaluate.sort_dict(dict, reverse: bool = True, cut_off=0)
- class relevanceai.vector_tools.cluster_evaluate.ClusterEvaluate(project, api_key)
Bases:
relevanceai.api.client.BatchAPIClient,relevanceai.base._Base,doc_utils.DocUtilsBatch API client
- plot(self, dataset_id: str, vector_field: str, cluster_alias: str, ground_truth_field: str = None, description_fields: list = [], marker_size: int = 5)
Plot the vectors in a collection to compare performance of cluster labels, optionally, against ground truth labels
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field that was clustered upon
cluster_alias (string) – The alias of the clustered labels
ground_truth_field (string) – The field to use as ground truth
description_fields (list) – List of fields to use as additional labels on plot
marker_size (int) – Size of scatterplot marker
- metrics(self, dataset_id: str, vector_field: str, cluster_alias: str, ground_truth_field: str = None)
Determine the performance of clusters through the Silhouette Score, and optionally against ground truth labels through Rand Index, Homogeneity and Completeness
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field that was clustered upon
cluster_alias (string) – The alias of the clustered labels
ground_truth_field (string) – The field to use as ground truth
- distribution(self, dataset_id: str, vector_field: str, cluster_alias: str, ground_truth_field: str = None, transpose=False)
Determine the distribution of clusters, optionally against the ground truth
- Parameters
dataset_id (string) – Unique name of dataset
vector_field (string) – The vector field that was clustered upon
cluster_alias (string) – The alias of the clustered labels
ground_truth_field (string) – The field to use as ground truth
transpose (bool) – Whether to transpose cluster and ground truth perspectives
- _get_cluster_documents(self, dataset_id: str, vector_field: str, cluster_alias: str, ground_truth_field: str = None, description_fields: list = [], get_vectors=True)
Return vectors, cluster labels, ground truth labels and other fields
- static plot_from_docs(vectors: list, cluster_labels: list, ground_truth: list = None, vector_description: dict = None, marker_size: int = 5)
Plot the vectors in a collection to compare performance of cluster labels, optionally, against ground truth labels
- Parameters
vectors (list) – List of vectors which were clustered upon
cluster_labels (list) – List of cluster labels corresponding to the vectors
ground_truth (list) – List of ground truth labels for the vectors
vector_description (dict) – Dictionary of fields and their values to describe the vectors
marker_size (int) – Size of scatterplot marker
- static metrics_from_docs(vectors, cluster_labels, ground_truth=None)
Determine the performance of clusters through the Silhouette Score, and optionally against ground truth labels through Rand Index, Homogeneity and Completeness
- Parameters
vectors (list) – List of vectors which were clustered upon
cluster_labels (list) – List of cluster labels corresponding to the vectors
ground_truth (list) – List of ground truth labels for the vectors
- static label_distribution_from_docs(label)
Determine the distribution of a label
- Parameters
label (list) – List of labels
- static label_joint_distribution_from_docs(label_1, label_2)
Determine the distribution of a label against another label
- Parameters
label_1 (list) – List of labels
label_2 (list) – List of labels
- static silhouette_score(vectors, cluster_labels)
- static adjusted_rand_score(ground_truth, cluster_labels)
- static completeness_score(ground_truth, cluster_labels)
- static homogeneity_score(ground_truth, cluster_labels)
- static _generate_layout()
- static _generate_plot(df, hover_label, marker_size)