How to Compare and Evaluate Unsupervised Clustering Methods?

Using Python, Scikit-Learn, and Google Colab

Carla Martins
20 min readFeb 23


Evaluating the performance of unsupervised clustering methods is difficult because there are no ground truth labels to compare the clustering results with. Remember that clustering is an unsupervised learning task, meaning that it operates on unlabeled data and tries to find structure in the data without relying on pre-existing labels.

The solution?

Evaluation metrics for unsupervised clustering methods must rely on intrinsic properties of the data and clustering results, such as compactness and separation of the clusters, consistency with external knowledge, and stability of the results across different runs of the same algorithm.

In this article, I will provide a comprehensive overview of various evaluation methods available in the Scikit-Learn library for comparing different clustering techniques. The code for this article will be executed in Google Colab, and each step will be thoroughly documented. This is also a long read and will take you some time and effort, so be prepared!

Table of Contents:
1. Create data
2. Build Clustering Methods to Compare:
- K-Means
- Affinity Propagation
- Agglomerative Clustering
- Mean Shift Clustering
- Bisecting K-Means
3. Evaluation Methods if we know Ground True labels:
- Rand Index (RI)
- Adjusted RI
- Mutual Information Score (MIS)
- Normalized MIS
- Adjusted MIS
- Homogeneity and Completeness
- V-Measure
- Fowlkes-Mallows Score
4. Evaluation Methods if we don't know Ground True labels:
- Silhouette Score
- Calinski-Harabasz Index
- Davies-Buldin Index

The initial step involves generating artificial data that we will use as input for our clustering techniques.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#Set the number of samples and features
n_samples = 1000
n_features = 4

#Create an empty array to store the data
data = np.empty((n_samples, n_features))

#Generate random data for each feature
for i in range(n_features):
data[:, i] = np.random.normal(size=n_samples)

#Create 5 clusters…



Carla Martins

Compulsive learner. Passionate about technology. Speaks C, R, Python, SQL, Haskell, Java and LaTeX. Interested in creating solutions.