Skip to content

Commit 6541f3f

Browse files
committed
Merge pull request scikit-learn#5594 from betatim/isolationtree-docs
[MRG] English language changed to IsolationTree docs
2 parents d591ac4 + be2499e commit 6541f3f

File tree

4 files changed

+31
-33
lines changed

4 files changed

+31
-33
lines changed

doc/README

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Documentation for scikit-learn
22
-------------------------------
33

44
This section contains the full manual and web page as displayed in
5-
http://scikit-learn.sf.net. To generate the full web page, including
5+
http://scikit-learn.org. To generate the full web page, including
66
the example gallery (this might take a while):
77

88
make html

doc/modules/outlier_detection.rst

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -197,21 +197,20 @@ Isolation Forest
197197

198198
One efficient way of performing outlier detection in high-dimensional datasets
199199
is to use random forests.
200-
:class:`ensemble.IsolationForest` consists in 'isolating' the observations
201-
by randomly selecting a feature and then randomly selecting a split value
202-
between the maximum and minimum values of the selected feature.
200+
The :class:`ensemble.IsolationForest` 'isolates' observations by randomly selecting
201+
a feature and then randomly selecting a split value between the maximum and
202+
minimum values of the selected feature.
203203

204204
Since recursive partitioning can be represented by a tree structure, the
205-
number of splitting required to isolate a point is equivalent to the path
206-
length from the root node to a terminating node.
205+
number of splittings required to isolate a sample is equivalent to the path
206+
length from the root node to the terminating node.
207207

208-
This path length, averaged among a forest of such random trees, is a
208+
This path length, averaged over a forest of such random trees, is a
209209
measure of abnormality and our decision function.
210210

211-
Indeed random partitioning produces noticeable shorter paths for anomalies.
211+
Random partitioning produces noticeably shorter paths for anomalies.
212212
Hence, when a forest of random trees collectively produce shorter path
213-
lengths for some particular points, then they are highly likely to be
214-
anomalies.
213+
lengths for particular samples, they are highly likely to be anomalies.
215214

216215
This strategy is illustrated below.
217216

examples/ensemble/plot_isolation_forest.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,20 @@
55
66
An example using IsolationForest for anomaly detection.
77
8-
IsolationForest consists in 'isolating' the observations by randomly selecting
9-
a feature and then randomly selecting a split value between the maximum and
10-
minimum values of the selected feature.
8+
The IsolationForest 'isolates' observations by randomly selecting a feature
9+
and then randomly selecting a split value between the maximum and minimum
10+
values of the selected feature.
1111
1212
Since recursive partitioning can be represented by a tree structure, the
13-
number of splitting required to isolate a sample is equivalent to the path
14-
length from the root node to a terminating node.
13+
number of splittings required to isolate a sample is equivalent to the path
14+
length from the root node to the terminating node.
1515
16-
This path length, averaged among a forest of such random trees, is a measure
16+
This path length, averaged over a forest of such random trees, is a measure
1717
of abnormality and our decision function.
1818
19-
Indeed random partitioning produces noticeable shorter paths for anomalies.
19+
Random partitioning produces noticeable shorter paths for anomalies.
2020
Hence, when a forest of random trees collectively produce shorter path lengths
21-
for some particular samples, then they are highly likely to be anomalies.
21+
for particular samples, they are highly likely to be anomalies.
2222
2323
.. [1] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation forest."
2424
Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.

sklearn/ensemble/iforest.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,23 +24,22 @@
2424
class IsolationForest(BaseBagging):
2525
"""Isolation Forest Algorithm
2626
27-
Return the anomaly score of each sample with the IsolationForest algorithm
27+
Return the anomaly score of each sample using the IsolationForest algorithm
2828
29-
IsolationForest consists in 'isolate' the observations by randomly
30-
selecting a feature and then randomly selecting a split value
31-
between the maximum and minimum values of the selected feature.
29+
The IsolationForest 'isolates' observations by randomly selecting a feature
30+
and then randomly selecting a split value between the maximum and minimum
31+
values of the selected feature.
3232
3333
Since recursive partitioning can be represented by a tree structure, the
34-
number of splitting required to isolate a point is equivalent to the path
35-
length from the root node to a terminating node.
34+
number of splittings required to isolate a sample is equivalent to the path
35+
length from the root node to the terminating node.
3636
37-
This path length, averaged among a forest of such random trees, is a
37+
This path length, averaged over a forest of such random trees, is a
3838
measure of abnormality and our decision function.
3939
40-
Indeed random partitioning produces noticeable shorter paths for anomalies.
40+
Random partitioning produces noticeably shorter paths for anomalies.
4141
Hence, when a forest of random trees collectively produce shorter path
42-
lengths for some particular points, then they are highly likely to be
43-
anomalies.
42+
lengths for particular samples, they are highly likely to be anomalies.
4443
4544
4645
Parameters
@@ -52,8 +51,8 @@ class IsolationForest(BaseBagging):
5251
The number of samples to draw from X to train each base estimator.
5352
- If int, then draw `max_samples` samples.
5453
- If float, then draw `max_samples * X.shape[0]` samples.
55-
If max_samples is larger than number of samples provided,
56-
all samples with be used for all trees (no sampling).
54+
If max_samples is larger than the number of samples provided,
55+
all samples will be used for all trees (no sampling).
5756
5857
max_features : int or float, optional (default=1.0)
5958
The number of features to draw from X to train each base estimator.
@@ -169,12 +168,12 @@ def predict(self, X):
169168
"""Predict anomaly score of X with the IsolationForest algorithm.
170169
171170
The anomaly score of an input sample is computed as
172-
the mean anomaly scores of the trees in the forest.
171+
the mean anomaly score of the trees in the forest.
173172
174173
The measure of normality of an observation given a tree is the depth
175174
of the leaf containing this observation, which is equivalent to
176-
the number of splitting required to isolate this point. In case of
177-
several observations n_left in the leaf, the average length path of
175+
the number of splittings required to isolate this point. In case of
176+
several observations n_left in the leaf, the average path length of
178177
a n_left samples isolation tree is added.
179178
180179
Parameters

0 commit comments

Comments
 (0)