Skip to content

Commit c176ced

Browse files
committed
DOC double backticks for fixed-width (code) font
Fixes scikit-learn#3337. Also other minor fixes while I was at it.
1 parent 1775095 commit c176ced

37 files changed

+361
-350
lines changed

doc/developers/index.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ email to the mailing list in order to get more visibility.
100100
.. note::
101101

102102
In the above setup, your ``origin`` remote repository points to
103-
YourLogin/scikit-learn.git. If you wish to `fetch/merge` from the main
104-
repository instead of your `forked` one, you will need to add another remote
103+
YourLogin/scikit-learn.git. If you wish to fetch/merge from the main
104+
repository instead of your forked one, you will need to add another remote
105105
to use instead of ``origin``. If we choose the name ``upstream`` for it, the
106106
command will be::
107107

@@ -242,7 +242,7 @@ Finally, any math and equations, followed by references,
242242
can be added to further the documentation. Not starting the
243243
documentation with the maths makes it more friendly towards
244244
users that are just interested in what the feature will do, as
245-
opposed to how it works `under the hood`.
245+
opposed to how it works "under the hood".
246246

247247

248248
.. warning:: **Sphinx version**
@@ -372,7 +372,7 @@ In addition, we add the following guidelines:
372372
that is implemented in ``sklearn.foo.bar.baz``,
373373
the test should import it from ``sklearn.foo``.
374374

375-
* **Please don't use `import *` in any case**. It is considered harmful
375+
* **Please don't use ``import *`` in any case**. It is considered harmful
376376
by the `official Python recommendations
377377
<http://docs.python.org/howto/doanddont.html#from-module-import>`_.
378378
It makes the code harder to read as the origin of symbols is no
@@ -396,7 +396,7 @@ Input validation
396396

397397
The module :mod:`sklearn.utils` contains various functions for doing input
398398
validation and conversion. Sometimes, ``np.asarray`` suffices for validation;
399-
do `not` use ``np.asanyarray`` or ``np.atleast_2d``, since those let NumPy's
399+
do *not* use ``np.asanyarray`` or ``np.atleast_2d``, since those let NumPy's
400400
``np.matrix`` through, which has a different API
401401
(e.g., ``*`` means dot product on ``np.matrix``,
402402
but Hadamard product on ``np.ndarray``).
@@ -634,14 +634,14 @@ an estimator without passing any arguments to it. The arguments should all
634634
correspond to hyperparameters describing the model or the optimisation
635635
problem the estimator tries to solve. These initial arguments (or parameters)
636636
are always remembered by the estimator.
637-
Also note that they should not be documented under the `Attributes` section,
638-
but rather under the `Parameters` section for that estimator.
637+
Also note that they should not be documented under the "Attributes" section,
638+
but rather under the "Parameters" section for that estimator.
639639

640640
In addition, **every keyword argument accepted by ``__init__`` should
641641
correspond to an attribute on the instance**. Scikit-learn relies on this to
642642
find the relevant attributes to set on an estimator when doing model selection.
643643

644-
To summarize, a `__init__` should look like::
644+
To summarize, an ``__init__`` should look like::
645645

646646
def __init__(self, param1=1, param2=2):
647647
self.param1 = param1
@@ -722,8 +722,8 @@ Estimated Attributes
722722

723723
Attributes that have been estimated from the data must always have a name
724724
ending with trailing underscore, for example the coefficients of
725-
some regression estimator would be stored in a `coef_` attribute after
726-
`fit()` has been called.
725+
some regression estimator would be stored in a ``coef_`` attribute after
726+
``fit`` has been called.
727727

728728
The last-mentioned attributes are expected to be overridden when
729729
you call ``fit`` a second time without taking any previous value into

doc/developers/performance.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -360,8 +360,8 @@ directory::
360360
7 13.61 MB -152.59 MB del b
361361
8 13.61 MB 0.00 MB return a
362362

363-
Another useful magic that ``memory_profiler`` defines is `%memit`, which is
364-
analogous to `%timeit`. It can be used as follows::
363+
Another useful magic that ``memory_profiler`` defines is ``%memit``, which is
364+
analogous to ``%timeit``. It can be used as follows::
365365

366366
In [1]: import numpy as np
367367

doc/developers/utilities.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ Efficient Random Sampling
129129
=========================
130130

131131
- :func:`random.sample_without_replacement`: implements efficient algorithms
132-
for sampling `n_samples` integers from a population of size `n_population`
132+
for sampling ``n_samples`` integers from a population of size ``n_population``
133133
without replacement.
134134

135135

@@ -272,7 +272,7 @@ Hash Functions
272272
==============
273273

274274
- :func:`murmurhash3_32` provides a python wrapper for the
275-
`MurmurHash3_x86_32` C++ non cryptographic hash function. This hash
275+
``MurmurHash3_x86_32`` C++ non cryptographic hash function. This hash
276276
function is suitable for implementing lookup tables, Bloom filters,
277277
Count Min Sketch, feature hashing and implicitly defined sparse
278278
random projections::

doc/install.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -239,9 +239,9 @@ Arch Linux
239239
----------
240240

241241
Arch Linux's package is provided through the `official repositories
242-
<https://www.archlinux.org/packages/?q=scikit-learn>`_ as `python-scikit-learn`
243-
for Python 3 and `python2-scikit-learn` for Python 2. It can be installed
244-
by typing the following command:
242+
<https://www.archlinux.org/packages/?q=scikit-learn>`_ as
243+
``python-scikit-learn`` for Python 3 and ``python2-scikit-learn`` for Python 2.
244+
It can be installed by typing the following command:
245245

246246
.. code-block:: none
247247
@@ -266,9 +266,9 @@ scikit-learn is available via `pkgsrc-wip <http://pkgsrc-wip.sourceforge.net/>`_
266266
Fedora
267267
------
268268

269-
The Fedora package is called `python-scikit-learn` for the Python 2 version
270-
and `python3-scikit-learn` for the Python 3 version. Both versions can
271-
be installed using `yum`::
269+
The Fedora package is called ``python-scikit-learn`` for the Python 2 version
270+
and ``python3-scikit-learn`` for the Python 3 version. Both versions can
271+
be installed using ``yum``::
272272

273273
$ sudo yum install python-scikit-learn
274274

doc/modules/biclustering.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ cluster rows and columns of a data matrix. These clusters of rows and
1010
columns are known as biclusters. Each determines a submatrix of the
1111
original data matrix with some desired properties.
1212

13-
For instance, given a matrix of shape `(10, 10)`, one possible bicluster
14-
with three rows and two columns induces a submatrix of shape `(3, 2)`::
13+
For instance, given a matrix of shape ``(10, 10)``, one possible bicluster
14+
with three rows and two columns induces a submatrix of shape ``(3, 2)``::
1515

1616
>>> import numpy as np
1717
>>> data = np.arange(100).reshape(10, 10)

doc/modules/clustering.rst

Lines changed: 50 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ Clustering
88
unlabeled data can be performed with the module :mod:`sklearn.cluster`.
99

1010
Each clustering algorithm comes in two variants: a class, that implements
11-
the `fit` method to learn the clusters on train data, and a function,
11+
the ``fit`` method to learn the clusters on train data, and a function,
1212
that, given train data, returns an array of integer labels corresponding
1313
to the different clusters. For the class, the labels over the training
14-
data can be found in the `labels_` attribute.
14+
data can be found in the ``labels_`` attribute.
1515

1616
.. currentmodule:: sklearn.cluster
1717

@@ -53,7 +53,7 @@ Overview of clustering methods
5353

5454
* - :ref:`K-Means <k_means>`
5555
- number of clusters
56-
- Very large `n_samples`, medium `n_clusters` with
56+
- Very large ``n_samples``, medium ``n_clusters`` with
5757
:ref:`MiniBatch code <mini_batch_kmeans>`
5858
- General-purpose, even cluster size, flat geometry, not too many clusters
5959
- Distances between points
@@ -66,32 +66,32 @@ Overview of clustering methods
6666

6767
* - :ref:`Mean-shift <mean_shift>`
6868
- bandwidth
69-
- Not scalable with n_samples
69+
- Not scalable with ``n_samples``
7070
- Many clusters, uneven cluster size, non-flat geometry
7171
- Distances between points
7272

7373
* - :ref:`Spectral clustering <spectral_clustering>`
7474
- number of clusters
75-
- Medium `n_samples`, small `n_clusters`
75+
- Medium ``n_samples``, small ``n_clusters``
7676
- Few clusters, even cluster size, non-flat geometry
7777
- Graph distance (e.g. nearest-neighbor graph)
7878

7979
* - :ref:`Ward hierarchical clustering <hierarchical_clustering>`
8080
- number of clusters
81-
- Large `n_samples` and `n_clusters`
81+
- Large ``n_samples`` and ``n_clusters``
8282
- Many clusters, possibly connectivity constraints
8383
- Distances between points
8484

8585
* - :ref:`Agglomerative clustering <hierarchical_clustering>`
8686
- number of clusters, linkage type, distance
87-
- Large `n_samples` and `n_clusters`
87+
- Large ``n_samples`` and ``n_clusters``
8888
- Many clusters, possibly connectivity constraints, non Euclidean
8989
distances
9090
- Any pairwise distance
9191

9292
* - :ref:`DBSCAN <dbscan>`
9393
- neighborhood size
94-
- Very large `n_samples`, medium `n_clusters`
94+
- Very large ``n_samples``, medium ``n_clusters``
9595
- Non-flat geometry, uneven cluster sizes
9696
- Distances between nearest points
9797

@@ -118,12 +118,12 @@ K-means
118118

119119
The :class:`KMeans` algorithm clusters data by trying to separate samples
120120
in n groups of equal variance, minimizing a criterion known as the
121-
`inertia<inertia>` or within-cluster sum-of-squares.
121+
`inertia <inertia>` or within-cluster sum-of-squares.
122122
This algorithm requires the number of clusters to be specified.
123123
It scales well to large number of samples and has been used
124124
across a large range of application areas in many different fields.
125125

126-
The k-means algorithm divides a set of :math:`N` samples :math:`X`:
126+
The k-means algorithm divides a set of :math:`N` samples :math:`X`
127127
into :math:`K` disjoint clusters :math:`C`,
128128
each described by the mean :math:`\mu_j` of the samples in the cluster.
129129
The means are commonly called the cluster "centroids";
@@ -146,7 +146,7 @@ It suffers from various drawbacks:
146146
better and zero is optimal. But in very high-dimensional spaces, Euclidean
147147
distances tend to become inflated
148148
(this is an instance of the so-called "curse of dimensionality").
149-
Running a dimensionality reduction algorithm such as `PCA<PCA>`
149+
Running a dimensionality reduction algorithm such as `PCA <PCA>`
150150
prior to k-means clustering can alleviate this problem
151151
and speed up the computations.
152152

@@ -189,7 +189,7 @@ k-means++ initialization scheme, which has been implemented in scikit-learn
189189
random initialization, as shown in the reference.
190190

191191
A parameter can be given to allow K-means to be run in parallel, called
192-
`n_jobs`. Giving this parameter a positive value uses that many processors
192+
``n_jobs``. Giving this parameter a positive value uses that many processors
193193
(default: 1). A value of -1 uses all available processors, with -2 using one
194194
less, and so on. Parallelization generally speeds up computation at the cost of
195195
memory (in this case, multiple copies of centroids need to be stored, one for
@@ -232,7 +232,7 @@ k-means, mini-batch k-means produces results that are generally only slightly
232232
worse than the standard algorithm.
233233

234234
The algorithm iterates between two major steps, similar to vanilla k-means.
235-
In the first step, `b` samples are drawn randomly from the dataset, to form
235+
In the first step, :math:`b` samples are drawn randomly from the dataset, to form
236236
a mini-batch. These are then assigned to the nearest centroid. In the second
237237
step, the centroids are updated. In contrast to k-means, this is done on a
238238
per-sample basis. For each sample in the mini-batch, the assigned centroid
@@ -291,12 +291,12 @@ is given.
291291

292292
Affinity Propagation can be interesting as it chooses the number of
293293
clusters based on the data provided. For this purpose, the two important
294-
parameters are the `preference`, which controls how many exemplars are
295-
used, and the `damping` factor.
294+
parameters are the *preference*, which controls how many exemplars are
295+
used, and the *damping factor*.
296296

297297
The main drawback of Affinity Propagation is its complexity. The
298-
algorithm has a time complexity of the order :math:`O(N^2 T)`, where `N`
299-
is the number of samples and `T` is the number of iterations until
298+
algorithm has a time complexity of the order :math:`O(N^2 T)`, where :math:`N`
299+
is the number of samples and :math:`T` is the number of iterations until
300300
convergence. Further, the memory complexity is of the order
301301
:math:`O(N^2)` if a dense similarity matrix is used, but reducible if a
302302
sparse similarity matrix is used. This makes Affinity Propagation most
@@ -312,30 +312,34 @@ appropriate for small to medium sized datasets.
312312

313313
**Algorithm description:**
314314
The messages sent between points belong to one of two categories. The first is
315-
the responsibility `r(i, k)`, which is the accumulated evidence that sample `k`
316-
should be the exemplar for sample `i`. The second is the availability `a(i, k)`
317-
which is the accumulated evidence that sample `i` should choose sample `k` to
318-
be its exemplar, and considers the values for all other samples that `k` should
315+
the responsibility :math:`r(i, k)`,
316+
which is the accumulated evidence that sample :math:`k`
317+
should be the exemplar for sample :math:`i`.
318+
The second is the availability :math:`a(i, k)`
319+
which is the accumulated evidence that sample :math:`i`
320+
should choose sample :math:`k` to be its exemplar,
321+
and considers the values for all other samples that :math:`k` should
319322
be an exemplar. In this way, exemplars are chosen by samples if they are (1)
320323
similar enough to many samples and (2) chosen by many samples to be
321324
representative of themselves.
322325

323-
More formally, the responsibility of a sample `k` to be the exemplar of sample
324-
`i` is given by:
326+
More formally, the responsibility of a sample :math:`k`
327+
to be the exemplar of sample :math:`i` is given by:
325328

326329
.. math::
327330
328331
r(i, k) \leftarrow s(i, k) - max [ a(i, \acute{k}) + s(i, \acute{k}) \forall \acute{k} \neq k ]
329332
330-
Where :math:`s(i, k)` is the similarity between samples `i` and `k`. The
331-
availability of sample `k` to be the exemplar of sample `i` is given by:
333+
Where :math:`s(i, k)` is the similarity between samples :math:`i` and :math:`k`.
334+
The availability of sample :math:`k`
335+
to be the exemplar of sample :math:`i` is given by:
332336

333337
.. math::
334338
335339
a(i, k) \leftarrow min [0, r(k, k) + \sum_{\acute{i}~s.t.~\acute{i} \notin \{i, k\}}{r(\acute{i}, k)}]
336340
337-
To begin with, all values for `r` and `a` are set to zero, and the calculation
338-
of each iterates until convergence.
341+
To begin with, all values for :math:`r` and :math:`a` are set to zero,
342+
and the calculation of each iterates until convergence.
339343

340344
.. _mean_shift:
341345

@@ -367,9 +371,9 @@ the mean of the samples within its neighborhood:
367371
m(x_i) = \frac{\sum_{x_j \in N(x_i)}K(x_j - x_i)x_j}{\sum_{x_j \in N(x_i)}K(x_j - x_i)}
368372
369373
The algorithm automatically sets the number of clusters, instead of relying on a
370-
parameter `bandwidth`, which dictates the size of the region to search through.
374+
parameter ``bandwidth``, which dictates the size of the region to search through.
371375
This parameter can be set manually, but can be estimated using the provided
372-
`estimate_bandwidth` function, which is called if the bandwidth is not set.
376+
``estimate_bandwidth`` function, which is called if the bandwidth is not set.
373377

374378
The algorithm is not highly scalable, as it requires multiple nearest neighbor
375379
searches during the execution of the algorithm. The algorithm is guaranteed to
@@ -463,16 +467,16 @@ Different label assignment strategies
463467
---------------------------------------
464468

465469
Different label assignment strategies can be used, corresponding to the
466-
`assign_labels` parameter of :class:`SpectralClustering`.
467-
The `kmeans` strategy can match finer details of the data, but it can be
468-
more unstable. In particular, unless you control the `random_state`, it
470+
``assign_labels`` parameter of :class:`SpectralClustering`.
471+
The ``"kmeans"`` strategy can match finer details of the data, but it can be
472+
more unstable. In particular, unless you control the ``random_state``, it
469473
may not be reproducible from run-to-run, as it depends on a random
470-
initialization. On the other hand, the `discretize` strategy is 100%
474+
initialization. On the other hand, the ``"discretize"`` strategy is 100%
471475
reproducible, but it tends to create parcels of fairly even and
472476
geometrical shape.
473477

474478
===================================== =====================================
475-
`assign_labels="kmeans"` `assign_labels="discretize"`
479+
``assign_labels="kmeans"` ``assign_labels="discretize"``
476480
===================================== =====================================
477481
|lena_kmeans| |lena_discretize|
478482
===================================== =====================================
@@ -697,13 +701,14 @@ cluster is therefore a set of core samples, each close to each other
697701
(measured by some distance measure)
698702
and a set of non-core samples that are close to a core sample (but are not
699703
themselves core samples). There are two parameters to the algorithm,
700-
`min_samples` and `eps`, which define formally what we mean when we say *dense*.
701-
A higher `min_samples` or lower `eps` indicate higher density necessary to form
702-
a cluster.
704+
``min_samples`` and ``eps``,
705+
which define formally what we mean when we say *dense*.
706+
Higher ``min_samples`` or lower ``eps``
707+
indicate higher density necessary to form a cluster.
703708

704709
More formally, we define a core sample as being a sample in the dataset such
705-
that there exist `min_samples` other samples within a distance of
706-
`eps`, which are defined as *neighbors* of the core sample. This tells
710+
that there exist ``min_samples`` other samples within a distance of
711+
``eps``, which are defined as *neighbors* of the core sample. This tells
707712
us that the core sample is in a dense area of the vector space. A cluster
708713
is a set of core samples, that can be built by recursively by taking a core
709714
sample, finding all of its neighbors that are core samples, finding all of
@@ -713,9 +718,9 @@ in the cluster but are not themselves core samples. Intuitively, these samples
713718
are on the fringes of a cluster.
714719

715720
Any core sample is part of a cluster, by definition. Further, any cluster has
716-
at least `min_samples` points in it, following the definition of a core
721+
at least ``min_samples`` points in it, following the definition of a core
717722
sample. For any sample that is not a core sample, and does have a
718-
distance higher than `eps` to any core sample, it is considered an outlier by
723+
distance higher than ``eps`` to any core sample, it is considered an outlier by
719724
the algorithm.
720725

721726
In the figure below, the color indicates cluster membership, with large circles
@@ -739,9 +744,9 @@ by black points below.
739744
always belong to the same clusters (although the labels may be
740745
different). The non-determinism comes from deciding to which cluster a
741746
non-core sample belongs. A non-core sample can have a distance lower
742-
than `eps` to two core samples in different clusters. By the
747+
than ``eps`` to two core samples in different clusters. By the
743748
triangular inequality, those two core samples must be more distant than
744-
`eps` from each other, or they would be in the same cluster. The non-core
749+
``eps`` from each other, or they would be in the same cluster. The non-core
745750
sample is assigned to whichever cluster is generated first, where
746751
the order is determined randomly. Other than the ordering of
747752
the dataset, the algorithm is deterministic, making the results relatively
@@ -798,7 +803,7 @@ chance normalization**::
798803
>>> metrics.adjusted_rand_score(labels_true, labels_pred) # doctest: +ELLIPSIS
799804
0.24...
800805

801-
One can permute 0 and 1 in the predicted labels and rename `2` by `3` and get
806+
One can permute 0 and 1 in the predicted labels, rename 2 to 3, and get
802807
the same score::
803808

804809
>>> labels_pred = [1, 1, 0, 0, 3, 3]
@@ -921,7 +926,7 @@ proposed more recently and is **normalized against chance**::
921926
>>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +ELLIPSIS
922927
0.22504...
923928

924-
One can permute 0 and 1 in the predicted labels and rename `2` by `3` and get
929+
One can permute 0 and 1 in the predicted labels, rename 2 to 3 and get
925930
the same score::
926931

927932
>>> labels_pred = [1, 1, 0, 0, 3, 3]

0 commit comments

Comments
 (0)