0% found this document useful (0 votes)
24 views

Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-3

This document is a tutorial on using K-Means clustering in Python's Sklearn library. It demonstrates finding the optimal number of clusters (K) through the elbow method by plotting the within-cluster sum of squared errors against K values ranging from 2 to 12. The elbow method suggests 5 or 6 clusters. It also calculates silhouette scores for different K values, finding the highest score at K=5, further indicating the dataset has 6 proper clusters. The tutorial uses principal component analysis to reduce the dimensionality before applying K-Means clustering.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-3

This document is a tutorial on using K-Means clustering in Python's Sklearn library. It demonstrates finding the optimal number of clusters (K) through the elbow method by plotting the within-cluster sum of squared errors against K values ranging from 2 to 12. The elbow method suggests 5 or 6 clusters. It also calculates silhouette scores for different K values, finding the highest score at K=5, further indicating the dataset has 6 proper clusters. The tutorial uses principal component analysis to reduce the dimensionality before applying K-Means clustering.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge

principal component 1 principal component 2

0 -0.192221 0.319683

1 -0.458175 -0.018152

2 0.052562 0.551854

3 -0.402357 -0.014239

4 -0.031648 0.155578

Finding Optimum Value of K


i) Elbow Method with Within-Cluster-Sum of Squared Error
(WCSS)
Let us again use the elbow method with Within-Cluster-Sum of Squared Error (WCSS) to
determine the optimum value of K. From the graph it looks like there is a bend between 5 and
6.
In [16]:

K=range(2,12)
wss = []
for k in K:
kmeans=cluster.KMeans(n_clusters=k)
kmeans=kmeans.fit(pca_df)
wss_iter = kmeans.inertia_
wss.append(wss_iter)

In [17]:

https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 27/35


3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge

plt.xlabel('K')
plt.ylabel('Within-Cluster-Sum of Squared Errors (WSS)')
plt.plot(K,wss)
Out[17]:

···

https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 28/35


3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge

ii) The Silhouette Method


Using the Silhouette method, it can be seen that the Silhouette value is maximum for K=5.
Hence it can be concluded that the dataset can be segmented properly with 6 clusters.

In[18]:

import sklearn.cluster as cluster


import sklearn.metrics as metrics
for i in range(2,12):

labels=cluster.KMeans(n_clusters=i,random_state=200).fit(pca_df).labels_
print ("Silhouette score for k(clusters) = "+str(i)+" is "

+str(metrics.silhouette_score(pca_df,labels,metric="euclidean",sample_siz

Out[18]:

Silhouette score for k(clusters) = 2 is 0.4736269407502857


Silhouette score for k(clusters) = 3 is 0.44839082753844756
Silhouette score for k(clusters) = 4 is 0.43785291876777566
Silhouette score for k(clusters) = 5 is 0.45130680489606634
Silhouette score for k(clusters) = 6 is 0.4507847568968469
Silhouette score for k(clusters) = 7 is 0.4458795480456887
Silhouette score for k(clusters) = 8 is 0.4132957148795121
Silhouette score for k(clusters) = 9 is 0.4170428610065107
Silhouette score for k(clusters) = 10 is 0.4309783655094101
Silhouette score for k(clusters) = 11 is 0.42535265774570674

https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 29/35

You might also like