An entirely homogeneous clustering is one where each cluster has information that directs a place toward a similar class label. Homogeneity portrays the closeness of the clustering algorithm to this (homogeneity_score) perfection.
This metric is autonomous of the outright values of the labels. A permutation of the cluster label values won't change the score value in any way.
Syntax : sklearn.metrics.homogeneity_score(labels_true, labels_pred)
The Metric is not symmetric, switching label_true with label_pred will return the completeness_score.
Parameters :
- labels_true:<int array, shape = [n_samples]> : It accept the ground truth class labels to be used as a reference.
- labels_pred: <array-like of shape (n_samples,)>: It accepts the cluster labels to evaluate.
Returns:
homogeneity:<float>: Its return the score between 0.0 and 1.0 stands for perfectly homogeneous labeling.
Example1:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import homogeneity_score
# Changing the location file
# cd C:\Users\Dev\Desktop\Credit Card Fraud
# Loading the data
df = pd.read_csv('creditcard.csv')
# Separating the dependent and independent variables
y = df['Class']
X = df.drop('Class', axis=1)
# Building the clustering model
kmeans = KMeans(n_clusters=2)
# Training the clustering model
kmeans.fit(X)
# Storing the predicted Clustering labels
labels = kmeans.predict(X)
# Evaluating the performance
homogeneity_score(y, labels)
Output:
0.00496764949717645
Example 2: Perfectly homogeneous:
from sklearn.metrics.cluster import homogeneity_score
# Evaluate the score
hscore = homogeneity_score([0, 1, 0, 1], [1, 0, 1, 0])
print(hscore)
Output:
1.0
Example 3: Non-perfect labelings that further split classes into more clusters can be perfectly homogeneous:
from sklearn.metrics.cluster import homogeneity_score
# Evaluate the score
hscore = homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3])
print(hscore)
Output:
0.9999999999999999
Example 4: Include samples from different classes don't make for homogeneous labeling:
from sklearn.metrics.cluster import homogeneity_score
# Evaluate the score
hscore = homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1])
print(hscore)
Output:
0.0