Skip to content

Commit 4f3c60c

Browse files
IshankGulatijnothman
authored andcommitted
[MRG+2] FIX IsolationForest(max_features=0.8).predict(X) fails input validation (scikit-learn#5757)
Fixes scikit-learn#5732
1 parent 48b2d9a commit 4f3c60c

File tree

3 files changed

+31
-5
lines changed

3 files changed

+31
-5
lines changed

doc/whats_new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,10 @@ Bug fixes
136136
- Fix estimators to accept a ``sample_weight`` parameter of type
137137
``pandas.Series`` in their ``fit`` function. :issue:`7825` by
138138
`Kathleen Chen`_.
139+
140+
- Fixed a bug where :class:`sklearn.ensemble.IsolationForest` fails when
141+
``max_features`` is less than 1.
142+
:issue:`5732` by :user:`Ishank Gulati <IshankGulati>`.
139143

140144
- Fix a bug where :class:`sklearn.ensemble.VotingClassifier` raises an error
141145
when a numpy array is passed in for weights. :issue:`7983` by

sklearn/ensemble/iforest.py

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -248,17 +248,28 @@ def decision_function(self, X):
248248
"""
249249
# code structure from ForestClassifier/predict_proba
250250
# Check data
251-
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
251+
X = check_array(X, accept_sparse='csr')
252252
n_samples = X.shape[0]
253253

254254
n_samples_leaf = np.zeros((n_samples, self.n_estimators), order="f")
255255
depths = np.zeros((n_samples, self.n_estimators), order="f")
256256

257-
for i, tree in enumerate(self.estimators_):
258-
leaves_index = tree.apply(X)
259-
node_indicator = tree.decision_path(X)
257+
if self._max_features == X.shape[1]:
258+
subsample_features = False
259+
else:
260+
subsample_features = True
261+
262+
for i, (tree, features) in enumerate(zip(self.estimators_,
263+
self.estimators_features_)):
264+
if subsample_features:
265+
X_subset = X[:, features]
266+
else:
267+
X_subset = X
268+
leaves_index = tree.apply(X_subset)
269+
node_indicator = tree.decision_path(X_subset)
260270
n_samples_leaf[:, i] = tree.tree_.n_node_samples[leaves_index]
261-
depths[:, i] = np.asarray(node_indicator.sum(axis=1)).reshape(-1) - 1
271+
depths[:, i] = np.ravel(node_indicator.sum(axis=1))
272+
depths[:, i] -= 1
262273

263274
depths += _average_path_length(n_samples_leaf)
264275

sklearn/ensemble/tests/test_iforest.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,3 +200,14 @@ def test_max_samples_consistency():
200200
X = iris.data
201201
clf = IsolationForest().fit(X)
202202
assert_equal(clf.max_samples_, clf._max_samples)
203+
204+
205+
def test_iforest_subsampled_features():
206+
# It tests non-regression for #5732 which failed at predict.
207+
rng = check_random_state(0)
208+
X_train, X_test, y_train, y_test = train_test_split(boston.data[:50],
209+
boston.target[:50],
210+
random_state=rng)
211+
clf = IsolationForest(max_features=0.8)
212+
clf.fit(X_train, y_train)
213+
clf.predict(X_test)

0 commit comments

Comments
 (0)