Ordering Points To Identify Cluster Structure (OPTICS) using Sklearn

OPTICS (Ordering Points To Identify the Clustering Structure) is a clustering algorithm used to find clusters of different shapes and densities in a dataset. It works like DBSCAN but gives better results when data has clusters with varying densities.

Why we use OPTICS instead of DBSCAN?

DBSCAN needs a fixed eps which may not work well if some clusters are tight and others are loose.
OPTICS doesn’t force you to set a global distance. It gives a reachability plot and clusters can be extracted from it at different levels.
OPTICS handles datasets with varying densities better and identify both dense and sparse clusters in one go.
It provides more detailed cluster structure information making it easier to explore data visually and decide the best cut-off points for clusters.

Step 1: Importing Libraries

We will import all the necessary libraries like Matplotlib , numpy and scikit-learn.

Python

import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import OPTICS, cluster_optics_dbscan

Step 2: Creating Sample Data

We generate 6 different groups of points (clusters) each in a different location and with different densities. All groups are combined into one big dataset i.e X_modified.

Python

np.random.seed(42)
n_points_per_cluster = 200

C1 = np.array([-3, -1]) + 1.0 * np.random.randn(n_points_per_cluster, 2)
C2 = np.array([2, -2]) + 0.5 * np.random.randn(n_points_per_cluster, 2)
C3 = np.array([0, 2]) + 0.8 * np.random.randn(n_points_per_cluster, 2)
C4 = np.array([-1, 4]) + 0.2 * np.random.randn(n_points_per_cluster, 2)
C5 = np.array([1, -3]) + 1.2 * np.random.randn(n_points_per_cluster, 2)
C6 = np.array([4, 5]) + 1.5 * np.random.randn(n_points_per_cluster, 2)

X_modified = np.vstack((C1, C2, C3, C4, C5, C6))

Step 3: Apply OPTICS Clustering

Now we will apply OPTICS Clustering

min_samples=40: Minimum number of points to form a dense region.
xi=0.1 helps in detecting changes in cluster density.
min_cluster_size=0.1: Minimum size of clusters as fraction of dataset.

Python

clust = OPTICS(min_samples=40, xi=0.1, min_cluster_size=0.1)
clust.fit(X_modified)

Output:

Step 4: Extract Clusters Using DBSCAN Logic

These labels define clusters based on different eps or distance thresholds.

eps=0.7 finds smaller or tighter groups.
eps=1.5 finds larger or broader groups.

Python

labels_050 = cluster_optics_dbscan(
    reachability=clust.reachability_,
    core_distances=clust.core_distances_,
    ordering=clust.ordering_,
    eps=0.7 
)

labels_200 = cluster_optics_dbscan(
    reachability=clust.reachability_,
    core_distances=clust.core_distances_,
    ordering=clust.ordering_,
    eps=1.5 
)

Step 5: Prepare Values for Plotting

These help us plot how reachable each point is from others.

Python

space = np.arange(len(X_modified))
reachability = clust.reachability_[clust.ordering_]
labels = clust.labels_[clust.ordering_]

Step 6: Plotting the Results

Finally all the results are visualized in four subplots:

The reachability plot visualizes density-based clustering where valleys indicate clusters and peaks suggest noise or separations.
The bottom-left plot (OPTICS Clustering) shows automatically detected clusters based on density variations.
The middle plot (DBSCAN, eps=0.7) extracts smaller and tight clusters.
The right plot (DBSCAN, eps=1.5) merges clusters into broader groups.

Python

space = np.arange(len(X_modified))
reachability = clust.reachability_[clust.ordering_]
labels = clust.labels_[clust.ordering_]

plt.figure(figsize=(10, 7))
G = gridspec.GridSpec(2, 3)
ax1 = plt.subplot(G[0, :])
ax2 = plt.subplot(G[1, 0])
ax3 = plt.subplot(G[1, 1])
ax4 = plt.subplot(G[1, 2])

# Reachability Plot
colors = ["b.", "g.", "r.", "y.", "c."]
for klass, color in zip(range(0, 5), colors):
    Xk = space[labels == klass]
    Rk = reachability[labels == klass]
    ax1.plot(Xk, Rk, color, alpha=0.3)
ax1.plot(space[labels == -1], reachability[labels == -1], "k.", alpha=0.3)
ax1.plot(space, np.full_like(space, 1.5, dtype=float), "k-", alpha=0.5)
ax1.plot(space, np.full_like(space, 0.8, dtype=float), "k-.", alpha=0.5)
ax1.set_ylabel("Reachability (epsilon distance)")
ax1.set_title("Reachability Plot")

# OPTICS Clustering Result
colors = ["b.", "g.", "r.", "y.", "c."]
for klass, color in zip(range(0, 5), colors):
    Xk = X_modified[clust.labels_ == klass]
    ax2.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax2.plot(X_modified[clust.labels_ == -1, 0], X_modified[clust.labels_ == -1, 1], "k+", alpha=0.1)
ax2.set_title("Automatic Clustering\nOPTICS")

# DBSCAN Result at eps = 0.7
colors = ["b.", "g.", "r.", "c."]
for klass, color in zip(range(0, 4), colors):
    Xk = X_modified[labels_050 == klass]
    ax3.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax3.plot(X_modified[labels_050 == -1, 0], X_modified[labels_050 == -1, 1], "k+", alpha=0.1)
ax3.set_title("Clustering at 0.7 epsilon cut\nDBSCAN")

# DBSCAN Result at eps = 1.5
colors = ["b.", "m.", "y.", "c."]
for klass, color in zip(range(0, 4), colors):
    Xk = X_modified[labels_200 == klass]
    ax4.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax4.plot(X_modified[labels_200 == -1, 0], X_modified[labels_200 == -1, 1], "k+", alpha=0.1)
ax4.set_title("Clustering at 1.5 epsilon cut\nDBSCAN")

plt.tight_layout()
plt.show()

Output:

optics-

This comparison highlights OPTICS ability to detect clusters of varying densities while DBSCAN requires an appropriate epsilon value to segment data effectively. This visualization gives better insights for understand data's structure and identifying clusters and sparse regions.

To download complete code : Click here

Ordering Points To Identify Cluster Structure (OPTICS) using Sklearn

Why we use OPTICS instead of DBSCAN?

Step 1: Importing Libraries

Step 2: Creating Sample Data

Step 3: Apply OPTICS Clustering

Step 4: Extract Clusters Using DBSCAN Logic

Step 5: Prepare Values for Plotting

Step 6: Plotting the Results

Explore