@@ -8,10 +8,10 @@ Clustering
88unlabeled data can be performed with the module :mod: `sklearn.cluster `.
99
1010Each clustering algorithm comes in two variants: a class, that implements
11- the `fit ` method to learn the clusters on train data, and a function,
11+ the `` fit ` ` method to learn the clusters on train data, and a function,
1212that, given train data, returns an array of integer labels corresponding
1313to the different clusters. For the class, the labels over the training
14- data can be found in the `labels_ ` attribute.
14+ data can be found in the `` labels_ ` ` attribute.
1515
1616.. currentmodule :: sklearn.cluster
1717
@@ -53,7 +53,7 @@ Overview of clustering methods
5353
5454 * - :ref: `K-Means <k_means >`
5555 - number of clusters
56- - Very large `n_samples `, medium `n_clusters ` with
56+ - Very large `` n_samples `` , medium `` n_clusters ` ` with
5757 :ref: `MiniBatch code <mini_batch_kmeans >`
5858 - General-purpose, even cluster size, flat geometry, not too many clusters
5959 - Distances between points
@@ -66,32 +66,32 @@ Overview of clustering methods
6666
6767 * - :ref: `Mean-shift <mean_shift >`
6868 - bandwidth
69- - Not scalable with n_samples
69+ - Not scalable with `` n_samples ``
7070 - Many clusters, uneven cluster size, non-flat geometry
7171 - Distances between points
7272
7373 * - :ref: `Spectral clustering <spectral_clustering >`
7474 - number of clusters
75- - Medium `n_samples `, small `n_clusters `
75+ - Medium `` n_samples `` , small `` n_clusters ` `
7676 - Few clusters, even cluster size, non-flat geometry
7777 - Graph distance (e.g. nearest-neighbor graph)
7878
7979 * - :ref: `Ward hierarchical clustering <hierarchical_clustering >`
8080 - number of clusters
81- - Large `n_samples ` and `n_clusters `
81+ - Large `` n_samples `` and `` n_clusters ` `
8282 - Many clusters, possibly connectivity constraints
8383 - Distances between points
8484
8585 * - :ref: `Agglomerative clustering <hierarchical_clustering >`
8686 - number of clusters, linkage type, distance
87- - Large `n_samples ` and `n_clusters `
87+ - Large `` n_samples `` and `` n_clusters ` `
8888 - Many clusters, possibly connectivity constraints, non Euclidean
8989 distances
9090 - Any pairwise distance
9191
9292 * - :ref: `DBSCAN <dbscan >`
9393 - neighborhood size
94- - Very large `n_samples `, medium `n_clusters `
94+ - Very large `` n_samples `` , medium `` n_clusters ` `
9595 - Non-flat geometry, uneven cluster sizes
9696 - Distances between nearest points
9797
@@ -118,12 +118,12 @@ K-means
118118
119119The :class: `KMeans ` algorithm clusters data by trying to separate samples
120120in n groups of equal variance, minimizing a criterion known as the
121- `inertia<inertia> ` or within-cluster sum-of-squares.
121+ `inertia <inertia> ` or within-cluster sum-of-squares.
122122This algorithm requires the number of clusters to be specified.
123123It scales well to large number of samples and has been used
124124across a large range of application areas in many different fields.
125125
126- The k-means algorithm divides a set of :math: `N` samples :math: `X`:
126+ The k-means algorithm divides a set of :math: `N` samples :math: `X`
127127into :math: `K` disjoint clusters :math: `C`,
128128each described by the mean :math: `\mu _j` of the samples in the cluster.
129129The means are commonly called the cluster "centroids";
@@ -146,7 +146,7 @@ It suffers from various drawbacks:
146146 better and zero is optimal. But in very high-dimensional spaces, Euclidean
147147 distances tend to become inflated
148148 (this is an instance of the so-called "curse of dimensionality").
149- Running a dimensionality reduction algorithm such as `PCA<PCA> `
149+ Running a dimensionality reduction algorithm such as `PCA <PCA> `
150150 prior to k-means clustering can alleviate this problem
151151 and speed up the computations.
152152
@@ -189,7 +189,7 @@ k-means++ initialization scheme, which has been implemented in scikit-learn
189189random initialization, as shown in the reference.
190190
191191A parameter can be given to allow K-means to be run in parallel, called
192- `n_jobs `. Giving this parameter a positive value uses that many processors
192+ `` n_jobs ` `. Giving this parameter a positive value uses that many processors
193193(default: 1). A value of -1 uses all available processors, with -2 using one
194194less, and so on. Parallelization generally speeds up computation at the cost of
195195memory (in this case, multiple copies of centroids need to be stored, one for
@@ -232,7 +232,7 @@ k-means, mini-batch k-means produces results that are generally only slightly
232232worse than the standard algorithm.
233233
234234The algorithm iterates between two major steps, similar to vanilla k-means.
235- In the first step, `b ` samples are drawn randomly from the dataset, to form
235+ In the first step, :math: `b` samples are drawn randomly from the dataset, to form
236236a mini-batch. These are then assigned to the nearest centroid. In the second
237237step, the centroids are updated. In contrast to k-means, this is done on a
238238per-sample basis. For each sample in the mini-batch, the assigned centroid
@@ -291,12 +291,12 @@ is given.
291291
292292Affinity Propagation can be interesting as it chooses the number of
293293clusters based on the data provided. For this purpose, the two important
294- parameters are the ` preference ` , which controls how many exemplars are
295- used, and the ` damping ` factor.
294+ parameters are the * preference * , which controls how many exemplars are
295+ used, and the * damping factor * .
296296
297297The main drawback of Affinity Propagation is its complexity. The
298- algorithm has a time complexity of the order :math: `O(N^2 T)`, where `N `
299- is the number of samples and `T ` is the number of iterations until
298+ algorithm has a time complexity of the order :math: `O(N^2 T)`, where :math: `N`
299+ is the number of samples and :math: `T` is the number of iterations until
300300convergence. Further, the memory complexity is of the order
301301:math: `O(N^2 )` if a dense similarity matrix is used, but reducible if a
302302sparse similarity matrix is used. This makes Affinity Propagation most
@@ -312,30 +312,34 @@ appropriate for small to medium sized datasets.
312312
313313**Algorithm description: **
314314The messages sent between points belong to one of two categories. The first is
315- the responsibility `r(i, k) `, which is the accumulated evidence that sample `k `
316- should be the exemplar for sample `i `. The second is the availability `a(i, k) `
317- which is the accumulated evidence that sample `i ` should choose sample `k ` to
318- be its exemplar, and considers the values for all other samples that `k ` should
315+ the responsibility :math: `r(i, k)`,
316+ which is the accumulated evidence that sample :math: `k`
317+ should be the exemplar for sample :math: `i`.
318+ The second is the availability :math: `a(i, k)`
319+ which is the accumulated evidence that sample :math: `i`
320+ should choose sample :math: `k` to be its exemplar,
321+ and considers the values for all other samples that :math: `k` should
319322be an exemplar. In this way, exemplars are chosen by samples if they are (1)
320323similar enough to many samples and (2) chosen by many samples to be
321324representative of themselves.
322325
323- More formally, the responsibility of a sample `k ` to be the exemplar of sample
324- `i ` is given by:
326+ More formally, the responsibility of a sample :math: `k`
327+ to be the exemplar of sample :math: `i` is given by:
325328
326329.. math ::
327330
328331 r(i, k) \leftarrow s(i, k) - max [ a(i, \acute {k}) + s(i, \acute {k}) \forall \acute {k} \neq k ]
329332
330- Where :math: `s(i, k)` is the similarity between samples `i ` and `k `. The
331- availability of sample `k ` to be the exemplar of sample `i ` is given by:
333+ Where :math: `s(i, k)` is the similarity between samples :math: `i` and :math: `k`.
334+ The availability of sample :math: `k`
335+ to be the exemplar of sample :math: `i` is given by:
332336
333337.. math ::
334338
335339 a(i, k) \leftarrow min [0 , r(k, k) + \sum _{\acute {i}~s.t.~\acute {i} \notin \{ i, k\} }{r(\acute {i}, k)}]
336340
337- To begin with, all values for `r ` and `a ` are set to zero, and the calculation
338- of each iterates until convergence.
341+ To begin with, all values for :math: `r` and :math: `a` are set to zero,
342+ and the calculation of each iterates until convergence.
339343
340344.. _mean_shift :
341345
@@ -367,9 +371,9 @@ the mean of the samples within its neighborhood:
367371 m(x_i) = \frac {\sum _{x_j \in N(x_i)}K(x_j - x_i)x_j}{\sum _{x_j \in N(x_i)}K(x_j - x_i)}
368372
369373 The algorithm automatically sets the number of clusters, instead of relying on a
370- parameter `bandwidth `, which dictates the size of the region to search through.
374+ parameter `` bandwidth ` `, which dictates the size of the region to search through.
371375This parameter can be set manually, but can be estimated using the provided
372- `estimate_bandwidth ` function, which is called if the bandwidth is not set.
376+ `` estimate_bandwidth ` ` function, which is called if the bandwidth is not set.
373377
374378The algorithm is not highly scalable, as it requires multiple nearest neighbor
375379searches during the execution of the algorithm. The algorithm is guaranteed to
@@ -463,16 +467,16 @@ Different label assignment strategies
463467---------------------------------------
464468
465469Different label assignment strategies can be used, corresponding to the
466- `assign_labels ` parameter of :class: `SpectralClustering `.
467- The `kmeans ` strategy can match finer details of the data, but it can be
468- more unstable. In particular, unless you control the `random_state `, it
470+ `` assign_labels ` ` parameter of :class: `SpectralClustering `.
471+ The `` " kmeans" ` ` strategy can match finer details of the data, but it can be
472+ more unstable. In particular, unless you control the `` random_state ` `, it
469473may not be reproducible from run-to-run, as it depends on a random
470- initialization. On the other hand, the `discretize ` strategy is 100%
474+ initialization. On the other hand, the `` " discretize" ` ` strategy is 100%
471475reproducible, but it tends to create parcels of fairly even and
472476geometrical shape.
473477
474478===================================== =====================================
475- `assign_labels="kmeans" ` ` assign_labels="discretize" `
479+ `` assign_labels="kmeans"` `` assign_labels="discretize" ` `
476480===================================== =====================================
477481|lena_kmeans | |lena_discretize |
478482===================================== =====================================
@@ -697,13 +701,14 @@ cluster is therefore a set of core samples, each close to each other
697701(measured by some distance measure)
698702and a set of non-core samples that are close to a core sample (but are not
699703themselves core samples). There are two parameters to the algorithm,
700- `min_samples ` and `eps `, which define formally what we mean when we say *dense *.
701- A higher `min_samples ` or lower `eps ` indicate higher density necessary to form
702- a cluster.
704+ ``min_samples `` and ``eps ``,
705+ which define formally what we mean when we say *dense *.
706+ Higher ``min_samples `` or lower ``eps ``
707+ indicate higher density necessary to form a cluster.
703708
704709More formally, we define a core sample as being a sample in the dataset such
705- that there exist `min_samples ` other samples within a distance of
706- `eps `, which are defined as *neighbors * of the core sample. This tells
710+ that there exist `` min_samples ` ` other samples within a distance of
711+ `` eps ` `, which are defined as *neighbors * of the core sample. This tells
707712us that the core sample is in a dense area of the vector space. A cluster
708713is a set of core samples, that can be built by recursively by taking a core
709714sample, finding all of its neighbors that are core samples, finding all of
@@ -713,9 +718,9 @@ in the cluster but are not themselves core samples. Intuitively, these samples
713718are on the fringes of a cluster.
714719
715720Any core sample is part of a cluster, by definition. Further, any cluster has
716- at least `min_samples ` points in it, following the definition of a core
721+ at least `` min_samples ` ` points in it, following the definition of a core
717722sample. For any sample that is not a core sample, and does have a
718- distance higher than `eps ` to any core sample, it is considered an outlier by
723+ distance higher than `` eps ` ` to any core sample, it is considered an outlier by
719724the algorithm.
720725
721726In the figure below, the color indicates cluster membership, with large circles
@@ -739,9 +744,9 @@ by black points below.
739744 always belong to the same clusters (although the labels may be
740745 different). The non-determinism comes from deciding to which cluster a
741746 non-core sample belongs. A non-core sample can have a distance lower
742- than `eps ` to two core samples in different clusters. By the
747+ than `` eps ` ` to two core samples in different clusters. By the
743748 triangular inequality, those two core samples must be more distant than
744- `eps ` from each other, or they would be in the same cluster. The non-core
749+ `` eps ` ` from each other, or they would be in the same cluster. The non-core
745750 sample is assigned to whichever cluster is generated first, where
746751 the order is determined randomly. Other than the ordering of
747752 the dataset, the algorithm is deterministic, making the results relatively
@@ -798,7 +803,7 @@ chance normalization**::
798803 >>> metrics.adjusted_rand_score(labels_true, labels_pred) # doctest: +ELLIPSIS
799804 0.24...
800805
801- One can permute 0 and 1 in the predicted labels and rename ` 2 ` by ` 3 ` and get
806+ One can permute 0 and 1 in the predicted labels, rename 2 to 3, and get
802807the same score::
803808
804809 >>> labels_pred = [1, 1, 0, 0, 3, 3]
@@ -921,7 +926,7 @@ proposed more recently and is **normalized against chance**::
921926 >>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +ELLIPSIS
922927 0.22504...
923928
924- One can permute 0 and 1 in the predicted labels and rename ` 2 ` by ` 3 ` and get
929+ One can permute 0 and 1 in the predicted labels, rename 2 to 3 and get
925930the same score::
926931
927932 >>> labels_pred = [1, 1, 0, 0, 3, 3]
0 commit comments