-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Add support and unit test for PyOD models #34709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Additionally, it includes: - A minor fix for error messages in the `specifiable` module. - Support for scoring offline detectors on a subset of features.
0c1f2fc
to
b838cee
Compare
class PyODModelHandler(ModelHandler[beam.Row, | ||
PredictionResult, | ||
PyODBaseDetector]): | ||
"""Implementation of the ModelHandler interface for PyOD [#]_ Models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does [#]_ mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh does it create a footnote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's for footnote.
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise unittest.SkipTest('PyOD dependencies are not installed') | ||
|
||
|
||
class PyODIForestTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests are skipped because you need to add the dependencies to
Line 485 in 238233d
'ml_test': [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Let me add pyod there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
from apache_beam.ml.inference.base import KeyedModelHandler | ||
from apache_beam.ml.inference.base import ModelHandler | ||
from apache_beam.ml.inference.base import PredictionResult | ||
from apache_beam.ml.inference.base import _PostProcessingModelHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove underscores from these (e.g. _PostProcessingModelHandler -> PostProcessingModelHandler) since they are being imported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one cannot be changed, because in the following code the class will have to be turned into specifiable so that we can create a spec (see https://github.com/apache/beam/blob/release-2.64/sdks/python/apache_beam/ml/anomaly/transforms_test.py#L336) for the whole model handler object without pickling it.
seed = 1234 | ||
model = IForest(random_state=seed) | ||
model.fit(self.get_train_data()) | ||
self.tmp_fn = os.path.join(self.tmp_dir, 'iforest_pickled') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename tmp_fn pickled_model_uri?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
assert self._offline_detector._features is not None | ||
k, v = elem | ||
row_dict = v._asdict() | ||
return k, beam.Row(**{k: row_dict[k] for k in self._offline_detector._features}) # pylint: disable=line-too-long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we cant split this to two lines? Also here it looks a bit strange to access _features (with underscore) on _offline_detector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow the formatter does not split the original line. Let me see if this can be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a pair of brackets seems to make the formatter render it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here it looks a bit strange to access _features (with underscore) on _offline_detector.
Ack. I think it is fine as it is referred in the same package.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
047dac4
to
8f1f965
Compare
@claudevdm: Tests are all good now. Could you please take another look? Thanks! |
Assigning reviewers. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Additionally, it includes:
specifiable
module.