Skip to content

Commit d1f383f

Browse files
committed
Update documentation to add the latest features
1 parent aaebc2f commit d1f383f

File tree

15 files changed

+301
-411
lines changed

15 files changed

+301
-411
lines changed

CONTRIBUTING.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,7 @@ Ready to contribute? Here's how to set up `MLBlocks` for local development.
6969

7070
$ mkvirtualenv MLBlocks
7171
$ cd MLBlocks/
72-
$ pip install -e .
73-
$ pip install -r requirements_dev.txt
72+
$ make install-develop
7473

7574
4. Create a branch for local development::
7675

@@ -88,7 +87,8 @@ Ready to contribute? Here's how to set up `MLBlocks` for local development.
8887
6. When you're done making changes, check that your changes pass flake8 and the
8988
tests, including testing other Python versions with tox::
9089

91-
$ make test-all
90+
$ make lint # Check code styling
91+
$ make test-all # Execute tests on all python versions
9292

9393
7. Make also sure to include the necessary documentation in the code as docstrings following
9494
the `google docstring`_ style.

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ view-docs: docs ## view docs in browser
130130

131131
.PHONY: serve-docs
132132
serve-docs: view-docs ## compile the docs watching for changes
133-
watchmedo shell-command -W -R -D -p '*.rst;*.md' -c '$(MAKE) -C docs html' .
133+
watchmedo shell-command -W -R -D -p '*.rst;*.md' -c '$(MAKE) -C docs html' docs
134134

135135

136136
# RELEASE TARGETS

docs/advanced_usage/pipelines.rst

Lines changed: 96 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -154,11 +154,11 @@ call is issued would be:
154154

155155
X -> X1;
156156
X1 -> b1 [constraint=false];
157-
b1 -> X2 [label=modified];
158-
X2 -> b2 [constraint=false]
159-
b2 -> X3 [label=modified];
160-
X3 -> b3 [constraint=false]
161-
b3 -> y
157+
b1 -> X2;
158+
X2 -> b2 [constraint=false];
159+
b2 -> X3;
160+
X3 -> b3 [constraint=false];
161+
b3 -> y;
162162
}
163163

164164
Another schema with some more complexity would be one where there is one primitive that
@@ -191,9 +191,10 @@ of actions would be:
191191
b1 -> b2 [style=invis];
192192

193193
subgraph cluster_1 {
194-
X1 [label=X];
195-
f1 [label=features];
196-
X2 [label=X];
194+
{rank=same X1 f1}
195+
X1 [label=X group=c];
196+
f1 [label=features group=c];
197+
X2 [label=X group=c];
197198
f1 -> X1 [style=invis];
198199
X1 -> X2 [style=dashed];
199200
label = "Context";
@@ -204,8 +205,9 @@ of actions would be:
204205
{rank=same X features}
205206
features -> f1;
206207
X -> X1;
207-
{X1 f1} -> b1 [constraint=false];
208-
b1 -> X2 [label=encoded];
208+
X1 -> b1 [constraint=false];
209+
f1 -> b1 [constraint=false];
210+
b1 -> X2;
209211
X2 -> b2 [constraint=false]
210212
b2 -> y
211213
}
@@ -242,9 +244,9 @@ do its job:
242244
b0 -> b1 -> b2 [style=invis];
243245

244246
subgraph cluster_1 {
245-
X1 [label=X];
246-
f1 [label=features];
247-
X2 [label=X];
247+
X1 [label=X group=c];
248+
f1 [label=features group=c];
249+
X2 [label=X group=c];
248250
X1 -> f1 -> X2 [style=invis];
249251
X1 -> X2 [style=dashed];
250252
label = "Context";
@@ -256,12 +258,92 @@ do its job:
256258
X1 -> b0 [constraint=false];
257259
b0 -> f1;
258260
{X1 f1} -> b1 [constraint=false];
259-
b1 -> X2 [label=encoded];
261+
b1 -> X2;
260262
X2 -> b2 [constraint=false]
261263
b2 -> y
262264
}
263265

264266

267+
JSON Annotations
268+
----------------
269+
270+
Like primitives, Pipelines can also be annotated and stored as dicts or JSON files that contain
271+
the different arguments expected by the ``MLPipeline`` class, as well as the set hyperparameters
272+
and tunable hyperparameters.
273+
274+
Representing a Pipeline as a dict
275+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276+
277+
The dict representation of an Pipeline can be obtained directly from an ``MLPipeline`` instance,
278+
by calling its ``to_dict`` method.
279+
280+
.. ipython:: python
281+
282+
pipeline.to_dict()
283+
284+
Notice how the dict includes all the arguments that used when we created the ``MLPipeline``,
285+
as well as the hyperparameters that the pipeline is currently using and the complete specification
286+
of the tunable hypeparameters.
287+
288+
If we want to directly store the dict as a JSON we can do so by calling the ``save`` method
289+
with the path of the JSON file to create.
290+
291+
.. ipython:: python
292+
293+
pipeline.save('pipeline.json')
294+
295+
Loading a Pipeline from a dict
296+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
297+
298+
Similarly, once the we have a dict specification, we can load the Pipeline directly from it
299+
by calling the ``MLPipeline.from_dict`` method.
300+
301+
Bear in mind that the hyperparameter values and tunable ranges will be taken from the dict.
302+
This means that if we want to tweak the tunable hyperparameters to adjust it to a specific
303+
problem or dataset, we can do that directly on our dict representation.
304+
305+
.. ipython:: python
306+
307+
pipeline_dict = {
308+
"primitives": [
309+
"sklearn.preprocessing.StandardScaler",
310+
"sklearn.ensemble.RandomForestClassifier"
311+
],
312+
"hyperparameters": {
313+
"sklearn.ensemble.RandomForestClassifier#1": {
314+
"n_jobs": -1,
315+
"n_estimators": 100,
316+
"max_depth": 5,
317+
}
318+
},
319+
"tunable_hyperparameters": {
320+
"sklearn.ensemble.RandomForestClassifier#1": {
321+
"max_depth": {
322+
"type": "int",
323+
"default": 10,
324+
"range": [
325+
1,
326+
30
327+
]
328+
}
329+
}
330+
}
331+
}
332+
pipeline = MLPipeline.from_dict(pipeline_dict)
333+
pipeline.get_hyperparameters()
334+
pipeline.get_tunable_hyperparameters()
335+
336+
.. note:: Notice how we skipped many items in this last dict representation and only included
337+
the parts that we want to be different than the default values. MLBlocks will figure out
338+
the rest of the elements directly from the primitive annotations on its own!
339+
340+
Like with the ``save`` method, the **MLPipeline** class offers a convenience ``load`` method
341+
that allows loading the pipeline directly from a JSON file:
342+
343+
.. ipython:: python
344+
345+
pipeline = MLPipeline.load('pipeline.json')
346+
265347
.. _API Reference: ../api_reference.html
266348
.. _primitives: ../primitives.html
267349
.. _mlblocks.MLPipeline: ../api_reference.html#mlblocks.MLPipeline

docs/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@
2525
from recommonmark.parser import CommonMarkParser
2626
# from recommonmark.transform import AutoStructify
2727

28-
sys.path.insert(0, os.path.abspath('..'))
28+
# sys.path.insert(0, os.path.abspath('..'))
2929

3030
import mlblocks
31-
32-
mlblocks.add_primitives_path('../mlblocks_primitives')
31+
#
32+
# mlblocks.add_primitives_path('../mlblocks_primitives')
3333

3434
# -- General configuration ---------------------------------------------
3535

docs/getting_started/install.rst

Lines changed: 6 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -27,50 +27,23 @@ You can either clone the public repository:
2727

2828
.. code-block:: console
2929
30-
git clone git://github.com/HDI-Project/mlblocks
30+
git clone git://github.com/HDI-Project/MLBlocks
3131
3232
Or download the `tarball`_:
3333

3434
.. code-block:: console
3535
36-
curl -OL https://github.com/HDI-Project/mlblocks/tarball/master
36+
curl -OL https://github.com/HDI-Project/MLBlocks/tarball/master
3737
3838
Once you have a copy of the source, you can install it running the next command inside the
3939
project folder:
4040

4141
.. code-block:: console
4242
43-
$ pip install .
43+
$ make install
4444
45-
.. _Github repo: https://github.com/HDI-Project/mlblocks
46-
.. _tarball: https://github.com/HDI-Project/mlblocks/tarball/master
47-
48-
Additional Dependencies
49-
-----------------------
50-
51-
The previous commands install the bare minimum requirements to make MLBlocks work, but
52-
additional dependencies should be installed in order to run the `quickstart`_ and various
53-
examples found in the documentation.
54-
55-
The most important of these dependencies is the related project `MLPrimitives`_, which
56-
includes a huge list of primitives ready to be used by **MLBlocks**.
57-
58-
Installing these additional dependencies can be achieved by running the command:
59-
60-
.. code-block:: console
61-
62-
pip install mlblocks[demo]
63-
64-
if **MLBlocks** was installed from PyPi, or:
65-
66-
.. code-block:: console
67-
68-
pip install .[demo]
69-
70-
if you installed **MLBlocks** from sources.
71-
72-
.. _quickstart: quickstart.html
73-
.. _MLPrimitives: https://github.com/HDI-Project/MLPrimitives
45+
.. _Github repo: https://github.com/HDI-Project/MLBlocks
46+
.. _tarball: https://github.com/HDI-Project/MLBlocks/tarball/master
7447

7548
Development
7649
-----------
@@ -81,4 +54,4 @@ order to be able to run the tests and build the documentation:
8154

8255
.. code-block:: console
8356
84-
pip install -e .[dev]
57+
make install-develop

docs/getting_started/quickstart.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ them to the `MLPipeline class`_:
2424
2525
from mlblocks import MLPipeline
2626
primitives = [
27-
'sklearn.preprocessing.StandardScaler',
28-
'xgboost.XGBClassifier'
27+
'mlprimitives.feature_extraction.StringVectorizer',
28+
'sklearn.ensemble.RandomForestClassifier',
2929
]
3030
pipeline = MLPipeline(primitives)
3131
@@ -34,8 +34,8 @@ Optionally, specific `hyperparameters`_ can be also set by specifying them in a
3434
.. ipython:: python
3535
3636
hyperparameters = {
37-
'xgboost.XGBClassifier': {
38-
'learning_rate': 0.1
37+
'sklearn.ensemble.RandomForestClassifier': {
38+
'n_estimators': 100
3939
}
4040
}
4141
pipeline = MLPipeline(primitives, hyperparameters)
@@ -80,13 +80,13 @@ other ones will remain unmodified.
8080
.. ipython:: python
8181
8282
new_hyperparameters = {
83-
'xgboost.XGBClassifier#1': {
84-
'max_depth': 10
83+
'sklearn.ensemble.RandomForestClassifier#1': {
84+
'max_depth': 15
8585
}
8686
}
8787
pipeline.set_hyperparameters(new_hyperparameters)
8888
hyperparameters = pipeline.get_hyperparameters()
89-
hyperparameters['xgboost.XGBClassifier#1']['max_depth']
89+
hyperparameters['sklearn.ensemble.RandomForestClassifier#1']['max_depth']
9090
9191
Making predictions
9292
------------------
@@ -99,8 +99,8 @@ labels.
9999

100100
.. ipython:: python
101101
102-
from mlblocks.datasets import load_iris
103-
dataset = load_iris()
102+
from mlblocks.datasets import load_personae
103+
dataset = load_personae()
104104
X_train, X_test, y_train, y_test = dataset.get_splits(1)
105105
pipeline.fit(X_train, y_train)
106106

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@ integrate with deep learning libraries.
6262
:caption: Pipeline Examples
6363
:maxdepth: 1
6464

65-
pipeline_examples/tabular
65+
pipeline_examples/single_table
66+
pipeline_examples/multi_table
6667
pipeline_examples/text
6768
pipeline_examples/image
6869
pipeline_examples/graph

0 commit comments

Comments
 (0)