Skip to content

Commit dc883c7

Browse files
LisaThomas9qinhanmin2014
authored andcommitted
ENH Corrected spelling of Harabasz score. (scikit-learn#12211)
1 parent 88cdeb8 commit dc883c7

File tree

10 files changed

+66
-29
lines changed

10 files changed

+66
-29
lines changed

doc/modules/classes.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -904,7 +904,7 @@ details.
904904

905905
metrics.adjusted_mutual_info_score
906906
metrics.adjusted_rand_score
907-
metrics.calinski_harabaz_score
907+
metrics.calinski_harabasz_score
908908
metrics.davies_bouldin_score
909909
metrics.completeness_score
910910
metrics.cluster.contingency_matrix
@@ -1496,6 +1496,15 @@ Utilities from joblib:
14961496
Recently deprecated
14971497
===================
14981498

1499+
To be removed in 0.23
1500+
---------------------
1501+
1502+
.. autosummary::
1503+
:toctree: generated/
1504+
:template: deprecated_function.rst
1505+
1506+
metrics.calinski_harabaz_score
1507+
14991508

15001509
To be removed in 0.22
15011510
---------------------
@@ -1513,4 +1522,4 @@ To be removed in 0.22
15131522
:template: deprecated_function.rst
15141523

15151524
covariance.graph_lasso
1516-
datasets.fetch_mldata
1525+
datasets.fetch_mldata

doc/modules/clustering.rst

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1652,17 +1652,16 @@ Drawbacks
16521652
* :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py` : In this example
16531653
the silhouette analysis is used to choose an optimal value for n_clusters.
16541654

1655-
.. _calinski_harabaz_index:
1655+
.. _calinski_harabasz_index:
16561656

1657-
Calinski-Harabaz Index
1657+
Calinski-Harabasz Index
16581658
----------------------
1659-
1660-
If the ground truth labels are not known, the Calinski-Harabaz index
1661-
(:func:`sklearn.metrics.calinski_harabaz_score`) - also known as the Variance
1659+
If the ground truth labels are not known, the Calinski-Harabasz index
1660+
(:func:`sklearn.metrics.calinski_harabasz_score`) - also known as the Variance
16621661
Ratio Criterion - can be used to evaluate the model, where a higher
1663-
Calinski-Harabaz score relates to a model with better defined clusters.
1662+
Calinski-Harabasz score relates to a model with better defined clusters.
16641663

1665-
For :math:`k` clusters, the Calinski-Harabaz score :math:`s` is given as the
1664+
For :math:`k` clusters, the Calinski-Harabasz score :math:`s` is given as the
16661665
ratio of the between-clusters dispersion mean and the within-cluster
16671666
dispersion:
16681667

@@ -1689,17 +1688,16 @@ points in cluster :math:`q`.
16891688
>>> X = dataset.data
16901689
>>> y = dataset.target
16911690

1692-
In normal usage, the Calinski-Harabaz index is applied to the results of a
1691+
In normal usage, the Calinski-Harabasz index is applied to the results of a
16931692
cluster analysis.
16941693

16951694
>>> import numpy as np
16961695
>>> from sklearn.cluster import KMeans
16971696
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
16981697
>>> labels = kmeans_model.labels_
1699-
>>> metrics.calinski_harabaz_score(X, labels) # doctest: +ELLIPSIS
1698+
>>> metrics.calinski_harabasz_score(X, labels) # doctest: +ELLIPSIS
17001699
561.62...
17011700

1702-
17031701
Advantages
17041702
~~~~~~~~~~
17051703

@@ -1712,7 +1710,7 @@ Advantages
17121710
Drawbacks
17131711
~~~~~~~~~
17141712

1715-
- The Calinski-Harabaz index is generally higher for convex clusters than other
1713+
- The Calinski-Harabasz index is generally higher for convex clusters than other
17161714
concepts of clusters, such as density based clusters like those obtained
17171715
through DBSCAN.
17181716

doc/whats_new/v0.20.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,11 @@ Support for Python 3.3 has been officially dropped.
783783
``working_memory`` config. See :ref:`working_memory`. :issue:`10280` by `Joel
784784
Nothman`_ and :user:`Aman Dalmia <dalmia>`.
785785

786+
- |API| The :func:`metrics.calinski_harabaz_score` has been renamed to
787+
:func:`metrics.calinski_harabasz_score` and will be removed in version 0.23.
788+
:issue:`12211` by :user:`Lisa Thomas <LisaThomas9>`,
789+
:user:`Mark Hannel <markhannel>` and :user:`Melissa Ferrari <mferrari3>`.
790+
786791

787792
:mod:`sklearn.mixture`
788793
......................

doc/whats_new/v0.21.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ Support for Python 3.4 and below has been officially dropped.
107107
when called before fit :issue:`12279` by :user:`Krishna Sangeeth
108108
<whiletruelearn>`.
109109

110+
110111
Multiple modules
111112
................
112113

sklearn/metrics/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
from .cluster import fowlkes_mallows_score
4444
from .cluster import silhouette_samples
4545
from .cluster import silhouette_score
46+
from .cluster import calinski_harabasz_score
4647
from .cluster import calinski_harabaz_score
4748
from .cluster import v_measure_score
4849
from .cluster import davies_bouldin_score
@@ -76,6 +77,7 @@
7677
'average_precision_score',
7778
'balanced_accuracy_score',
7879
'calinski_harabaz_score',
80+
'calinski_harabasz_score',
7981
'check_scoring',
8082
'classification_report',
8183
'cluster',

sklearn/metrics/cluster/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from .supervised import entropy
2020
from .unsupervised import silhouette_samples
2121
from .unsupervised import silhouette_score
22+
from .unsupervised import calinski_harabasz_score
2223
from .unsupervised import calinski_harabaz_score
2324
from .unsupervised import davies_bouldin_score
2425
from .bicluster import consensus_score
@@ -29,4 +30,5 @@
2930
"homogeneity_score", "mutual_info_score", "v_measure_score",
3031
"fowlkes_mallows_score", "entropy", "silhouette_samples",
3132
"silhouette_score", "calinski_harabaz_score",
32-
"davies_bouldin_score", "consensus_score"]
33+
"calinski_harabasz_score", "davies_bouldin_score",
34+
"consensus_score"]

sklearn/metrics/cluster/setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ def configuration(parent_package="", top_path=None):
1818

1919
return config
2020

21+
2122
if __name__ == "__main__":
2223
from numpy.distutils.core import setup
2324
setup(**configuration().todict())

sklearn/metrics/cluster/tests/test_common.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from sklearn.metrics.cluster import normalized_mutual_info_score
1313
from sklearn.metrics.cluster import v_measure_score
1414
from sklearn.metrics.cluster import silhouette_score
15-
from sklearn.metrics.cluster import calinski_harabaz_score
15+
from sklearn.metrics.cluster import calinski_harabasz_score
1616
from sklearn.metrics.cluster import davies_bouldin_score
1717

1818
from sklearn.utils.testing import assert_allclose, ignore_warnings
@@ -44,7 +44,7 @@
4444
UNSUPERVISED_METRICS = {
4545
"silhouette_score": silhouette_score,
4646
"silhouette_manhattan": partial(silhouette_score, metric='manhattan'),
47-
"calinski_harabaz_score": calinski_harabaz_score,
47+
"calinski_harabasz_score": calinski_harabasz_score,
4848
"davies_bouldin_score": davies_bouldin_score
4949
}
5050

sklearn/metrics/cluster/tests/test_unsupervised.py

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@
1010
from sklearn.utils.testing import assert_raises_regexp
1111
from sklearn.utils.testing import assert_raise_message
1212
from sklearn.utils.testing import assert_greater
13+
from sklearn.utils.testing import assert_warns_message
1314
from sklearn.metrics.cluster import silhouette_score
1415
from sklearn.metrics.cluster import silhouette_samples
1516
from sklearn.metrics import pairwise_distances
17+
from sklearn.metrics.cluster import calinski_harabasz_score
1618
from sklearn.metrics.cluster import calinski_harabaz_score
1719
from sklearn.metrics.cluster import davies_bouldin_score
1820

@@ -185,25 +187,34 @@ def assert_raises_on_all_points_same_cluster(func):
185187
rng.rand(10, 2), np.arange(10))
186188

187189

188-
def test_calinski_harabaz_score():
189-
assert_raises_on_only_one_label(calinski_harabaz_score)
190+
def test_calinski_harabasz_score():
191+
assert_raises_on_only_one_label(calinski_harabasz_score)
190192

191-
assert_raises_on_all_points_same_cluster(calinski_harabaz_score)
193+
assert_raises_on_all_points_same_cluster(calinski_harabasz_score)
192194

193195
# Assert the value is 1. when all samples are equals
194-
assert_equal(1., calinski_harabaz_score(np.ones((10, 2)),
195-
[0] * 5 + [1] * 5))
196+
assert_equal(1., calinski_harabasz_score(np.ones((10, 2)),
197+
[0] * 5 + [1] * 5))
196198

197199
# Assert the value is 0. when all the mean cluster are equal
198-
assert_equal(0., calinski_harabaz_score([[-1, -1], [1, 1]] * 10,
199-
[0] * 10 + [1] * 10))
200+
assert_equal(0., calinski_harabasz_score([[-1, -1], [1, 1]] * 10,
201+
[0] * 10 + [1] * 10))
200202

201203
# General case (with non numpy arrays)
202204
X = ([[0, 0], [1, 1]] * 5 + [[3, 3], [4, 4]] * 5 +
203205
[[0, 4], [1, 3]] * 5 + [[3, 1], [4, 0]] * 5)
204206
labels = [0] * 10 + [1] * 10 + [2] * 10 + [3] * 10
205-
pytest.approx(calinski_harabaz_score(X, labels),
206-
45 * (40 - 4) / (5 * (4 - 1)))
207+
pytest.approx(calinski_harabasz_score(X, labels),
208+
45 * (40 - 4) / (5 * (4 - 1)))
209+
210+
211+
def test_deprecated_calinski_harabaz_score():
212+
depr_message = ("Function 'calinski_harabaz_score' has been renamed "
213+
"to 'calinski_harabasz_score' "
214+
"and will be removed in version 0.23.")
215+
assert_warns_message(DeprecationWarning, depr_message,
216+
calinski_harabaz_score,
217+
np.ones((10, 2)), [0] * 5 + [1] * 5)
207218

208219

209220
def test_davies_bouldin_score():

sklearn/metrics/cluster/unsupervised.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
from ..pairwise import pairwise_distances_chunked
1818
from ..pairwise import pairwise_distances
1919
from ...preprocessing import LabelEncoder
20+
from sklearn.utils import deprecated
2021

2122

2223
def check_number_of_labels(n_labels, n_samples):
@@ -236,15 +237,15 @@ def silhouette_samples(X, labels, metric='euclidean', **kwds):
236237
return np.nan_to_num(sil_samples)
237238

238239

239-
def calinski_harabaz_score(X, labels):
240-
"""Compute the Calinski and Harabaz score.
240+
def calinski_harabasz_score(X, labels):
241+
"""Compute the Calinski and Harabasz score.
241242
242243
It is also known as the Variance Ratio Criterion.
243244
244245
The score is defined as ratio between the within-cluster dispersion and
245246
the between-cluster dispersion.
246247
247-
Read more in the :ref:`User Guide <calinski_harabaz_index>`.
248+
Read more in the :ref:`User Guide <calinski_harabasz_index>`.
248249
249250
Parameters
250251
----------
@@ -258,7 +259,7 @@ def calinski_harabaz_score(X, labels):
258259
Returns
259260
-------
260261
score : float
261-
The resulting Calinski-Harabaz score.
262+
The resulting Calinski-Harabasz score.
262263
263264
References
264265
----------
@@ -288,6 +289,13 @@ def calinski_harabaz_score(X, labels):
288289
(intra_disp * (n_labels - 1.)))
289290

290291

292+
@deprecated("Function 'calinski_harabaz_score' has been renamed to "
293+
"'calinski_harabasz_score' "
294+
"and will be removed in version 0.23.")
295+
def calinski_harabaz_score(X, labels):
296+
return calinski_harabasz_score(X, labels)
297+
298+
291299
def davies_bouldin_score(X, labels):
292300
"""Computes the Davies-Bouldin score.
293301

0 commit comments

Comments
 (0)