Skip to content

Commit e3ca51e

Browse files
committed
[DOC] Update documentation
1 parent 1fe3033 commit e3ca51e

14 files changed

+693
-1
lines changed

CHANGELOG.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
Changelog
2+
=========
3+
4+
+---------------+-----------------------------------------------------------+
5+
| Badge | Meaning |
6+
+===============+===========================================================+
7+
| |Feature| | Add something that cannot be achieved before. |
8+
+---------------+-----------------------------------------------------------+
9+
| |Efficiency| | Improve the efficiency on the computation or memory. |
10+
+---------------+-----------------------------------------------------------+
11+
| |Enhancement| | Miscellaneous minor improvements. |
12+
+---------------+-----------------------------------------------------------+
13+
| |Fix| | Fix up something that does not work as expected. |
14+
+---------------+-----------------------------------------------------------+
15+
| |API| | You will need to change the code to have the same effect. |
16+
+---------------+-----------------------------------------------------------+
17+
18+
Version 0.1.*
19+
-------------
20+
21+
.. role:: raw-html(raw)
22+
:format: html
23+
24+
.. role:: raw-latex(raw)
25+
:format: latex
26+
27+
.. |MajorFeature| replace:: :raw-html:`<span class="badge badge-success">Major Feature</span>` :raw-latex:`{\small\sc [Major Feature]}`
28+
.. |Feature| replace:: :raw-html:`<span class="badge badge-success">Feature</span>` :raw-latex:`{\small\sc [Feature]}`
29+
.. |Efficiency| replace:: :raw-html:`<span class="badge badge-info">Efficiency</span>` :raw-latex:`{\small\sc [Efficiency]}`
30+
.. |Enhancement| replace:: :raw-html:`<span class="badge badge-info">Enhancement</span>` :raw-latex:`{\small\sc [Enhancement]}`
31+
.. |Fix| replace:: :raw-html:`<span class="badge badge-danger">Fix</span>` :raw-latex:`{\small\sc [Fix]}`
32+
.. |API| replace:: :raw-html:`<span class="badge badge-warning">API Change</span>` :raw-latex:`{\small\sc [API Change]}`

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/_static/custom.css

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.wy-nav-content {
2+
max-width: 60%;
3+
}

docs/api_reference.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
API Reference
2+
=============
3+
4+
Below is the class and function reference for :mod:`deepforest`. Notice that the package is under active development, and some features may not be stable yet.
5+
6+
CascadeForestClassifier
7+
-----------------------
8+
9+
.. autoclass:: deepforest.CascadeForestClassifier
10+
:members:
11+
:inherited-members:
12+
:show-inheritance:
13+
:no-undoc-members:
14+
:member-order: bysource

docs/changelog.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.. _changelog:
2+
3+
.. include:: ../CHANGELOG.rst

docs/conf.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# This file only contains a selection of the most common options. For a full
4+
# list see the documentation:
5+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
6+
7+
# -- Path setup --------------------------------------------------------------
8+
9+
# If extensions (or modules to document with autodoc) are in another directory,
10+
# add these directories to sys.path here. If the directory is relative to the
11+
# documentation root, use os.path.abspath to make it absolute, like shown here.
12+
13+
import os
14+
import sys
15+
16+
sys.path.insert(0, os.path.abspath('..'))
17+
18+
19+
# -- Project information -----------------------------------------------------
20+
21+
project = 'Deep Forest'
22+
copyright = '2021, LAMDA Group, Nanjing University'
23+
author = 'Yi-Xuan Xu'
24+
25+
# The master toctree document.
26+
master_doc = 'index'
27+
28+
29+
# -- General configuration ---------------------------------------------------
30+
31+
autodoc_mock_imports = ["joblib", "scikit-learn"]
32+
33+
# Add any Sphinx extension module names here, as strings. They can be
34+
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
35+
# ones.
36+
extensions = [
37+
'sphinx.ext.autodoc',
38+
'sphinx.ext.autosummary',
39+
'sphinx.ext.todo',
40+
'sphinx.ext.napoleon',
41+
'sphinx_panels',
42+
'sphinx_copybutton'
43+
]
44+
45+
autoapi_dirs = ['../deepforest']
46+
47+
autodoc_member_order = 'bysource'
48+
49+
# Add any paths that contain templates here, relative to this directory.
50+
templates_path = ['_templates']
51+
52+
# List of patterns, relative to source directory, that match files and
53+
# directories to ignore when looking for source files.
54+
# This pattern also affects html_static_path and html_extra_path.
55+
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
56+
57+
# The name of the Pygments (syntax highlighting) style to use.
58+
pygments_style = "default"
59+
60+
# -- Options for HTML output -------------------------------------------------
61+
62+
# The theme to use for HTML and HTML Help pages. See the documentation for
63+
# a list of builtin themes.
64+
65+
html_theme = "sphinx_rtd_theme"
66+
67+
# Add any paths that contain custom static files (such as style sheets) here,
68+
# relative to this directory. They are copied after the builtin static files,
69+
# so a file named "default.css" will overwrite the builtin "default.css".
70+
html_static_path = ['_static']
71+
72+
html_css_files = [
73+
'custom.css',
74+
]

docs/experiments.rst

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
Experiments
2+
===========
3+
4+
Baseline
5+
********
6+
For all experiments, we used 5 popular tree-based ensemble methods as baselines. Details on the baselines are listed in the following table:
7+
8+
+------------------+---------------------------------------------------------------+
9+
| Name | Introduction |
10+
+==================+===============================================================+
11+
| `Random Forest`_ | An efficient implementation of Random Forest in Scikit-Learn |
12+
+------------------+---------------------------------------------------------------+
13+
| `HGBDT`_ | Histogram-based GBDT in Scikit-Learn |
14+
+------------------+---------------------------------------------------------------+
15+
| `XGBoost EXACT`_ | The vanilla version of XGBoost |
16+
+------------------+---------------------------------------------------------------+
17+
| `XGBoost HIST`_ | The histogram optimized version of XGBoost |
18+
+------------------+---------------------------------------------------------------+
19+
| `LightGBM`_ | Light Gradient Boosting Machine |
20+
+------------------+---------------------------------------------------------------+
21+
22+
Environment
23+
***********
24+
For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating.
25+
26+
+------------------+-----------------+--------+
27+
| OS | CPU | Memory |
28+
+==================+=================+========+
29+
| Ubuntu 18.04 LTS | Xeon E-2288G | 128GB |
30+
+------------------+-----------------+--------+
31+
32+
Setting
33+
*******
34+
We kept the number of decision trees the same across all baselines, while remaining hyper-parameters were set to their default values. Running scripts on reproducing all experiment results are available, please refer to this `Repo`_.
35+
36+
Dataset
37+
*******
38+
39+
We have collected a number of datasets for both binary and multi-class classification, as listed in the table below. They were selected based on the following criteria:
40+
41+
- Publicly available and easy to use;
42+
- Cover different application areas;
43+
- Reflect high diversity in terms of the number of samples, features, and classes.
44+
45+
As a result, some baselines may fail on datasets with too many samples or features. Such cases are indicated by ``N/A`` in all tables below.
46+
47+
+------------------+------------+-----------+------------+-----------+
48+
| Name | # Training | # Testing | # Features | # Classes |
49+
+==================+============+===========+============+===========+
50+
| `ijcnn1`_ | 49,990 | 91,701 | 22 | 2 |
51+
+------------------+------------+-----------+------------+-----------+
52+
| `pendigits`_ | 7,494 | 3,498 | 16 | 10 |
53+
+------------------+------------+-----------+------------+-----------+
54+
| `letter`_ | 15,000 | 5,000 | 16 | 26 |
55+
+------------------+------------+-----------+------------+-----------+
56+
| `connect-4`_ | 67,557 | 20,267 | 126 | 3 |
57+
+------------------+------------+-----------+------------+-----------+
58+
| `sector`_ | 6,412 | 3,207 | 55,197 | 105 |
59+
+------------------+------------+-----------+------------+-----------+
60+
| `covtype`_ | 406,708 | 174,304 | 54 | 7 |
61+
+------------------+------------+-----------+------------+-----------+
62+
| `susy`_ | 4,500,000 | 500,000 | 18 | 2 |
63+
+------------------+------------+-----------+------------+-----------+
64+
| `higgs`_ | 10,500,000 | 500,000 | 28 | 2 |
65+
+------------------+------------+-----------+------------+-----------+
66+
| `usps`_ | 7,291 | 2,007 | 256 | 10 |
67+
+------------------+------------+-----------+------------+-----------+
68+
| `mnist`_ | 60,000 | 10,000 | 784 | 10 |
69+
+------------------+------------+-----------+------------+-----------+
70+
| `fashion mnist`_ | 60,000 | 10,000 | 784 | 10 |
71+
+------------------+------------+-----------+------------+-----------+
72+
73+
Classification Accuracy
74+
***********************
75+
76+
The table below shows the testing accuracy of each method, with the best result on each dataset **bolded**. Each experiment was conducted over 5 independently trials, and the average result was reported.
77+
78+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
79+
| Name | RF | HGBDT | XGB EXACT | XGB HIST | LightGBM | Deep Forest |
80+
+===============+=======+=======+===========+===========+===========+=============+
81+
| ijcnn1 | 98.07 | 98.43 | 98.20 | 98.23 | **98.61** | 98.16 |
82+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
83+
| pendigits | 96.54 | 96.34 | 96.60 | 96.60 | 96.17 | **97.50** |
84+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
85+
| letter | 95.39 | 91.56 | 90.80 | 90.82 | 88.94 | **95.92** |
86+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
87+
| connect-4 | 70.18 | 70.88 | 71.57 | 71.57 | 70.31 | **72.05** |
88+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
89+
| sector | 85.62 | N/A | 66.01 | 65.61 | 63.24 | **86.74** |
90+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
91+
| covtype | 73.73 | 64.22 | 66.15 | 66.70 | 65.00 | **74.27** |
92+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
93+
| susy | 80.19 | 80.31 | 80.32 | **80.35** | 80.33 | 80.18 |
94+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
95+
| higgs | N/A | 74.95 | 75.85 | 76.00 | 74.97 | **76.46** |
96+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
97+
| usps | 93.79 | 94.32 | 93.77 | 93.37 | 93.97 | **94.67** |
98+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
99+
| mnist | 97.20 | 98.35 | 98.07 | 98.14 | **98.42** | 98.11 |
100+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
101+
| fashion mnist | 87.87 | 87.02 | 90.74 | 90.80 | **90.81** | 89.66 |
102+
+---------------+-------+-------+-----------+-----------+-----------+-------------+
103+
104+
Runtime
105+
*******
106+
107+
Runtime in seconds reported in the table below covers both the training stage and evaluating stage.
108+
109+
+---------------+---------+--------+-----------+----------+----------+-------------+
110+
| Name | RF | HGBDT | XGB EXACT | XGB HIST | LightGBM | Deep Forest |
111+
+===============+=========+========+===========+==========+==========+=============+
112+
| ijcnn1 | 9.60 | 6.84 | 11.24 | 1.90 | 1.99 | 8.37 |
113+
+---------------+---------+--------+-----------+----------+----------+-------------+
114+
| pendigits | 1.26 | 5.12 | 0.39 | 0.26 | 0.46 | 2.21 |
115+
+---------------+---------+--------+-----------+----------+----------+-------------+
116+
| letter | 0.76 | 1.30 | 0.34 | 0.17 | 0.19 | 2.84 |
117+
+---------------+---------+--------+-----------+----------+----------+-------------+
118+
| connect-4 | 5.17 | 7.54 | 13.26 | 3.19 | 1.12 | 10.73 |
119+
+---------------+---------+--------+-----------+----------+----------+-------------+
120+
| sector | 292.15 | N/A | 632.27 | 593.35 | 18.83 | 521.68 |
121+
+---------------+---------+--------+-----------+----------+----------+-------------+
122+
| covtype | 84.00 | 2.56 | 58.43 | 11.62 | 3.96 | 164.18 |
123+
+---------------+---------+--------+-----------+----------+----------+-------------+
124+
| susy | 1429.85 | 59.09 | 1051.54 | 44.85 | 34.40 | 1866.48 |
125+
+---------------+---------+--------+-----------+----------+----------+-------------+
126+
| higgs | N/A | 523.74 | 7532.70 | 267.64 | 209.65 | 7307.44 |
127+
+---------------+---------+--------+-----------+----------+----------+-------------+
128+
| usps | 9.28 | 8.73 | 9.43 | 5.78 | 9.81 | 6.08 |
129+
+---------------+---------+--------+-----------+----------+----------+-------------+
130+
| mnist | 590.81 | 229.91 | 1156.64 | 762.40 | 233.94 | 599.55 |
131+
+---------------+---------+--------+-----------+----------+----------+-------------+
132+
| fashion mnist | 735.47 | 32.86 | 1403.44 | 2061.80 | 428.37 | 661.05 |
133+
+---------------+---------+--------+-----------+----------+----------+-------------+
134+
135+
Some observations are listed as follow:
136+
137+
* Histogram-based GBDT (e.g., :class:`HGBDT`, :class:`XGB HIST`, :class:`LightGBM`) are typically faster mainly because decision tree in GBDT tends to have a much smaller tree depth;
138+
* With the number of input dimensions increasing (e.g., on mnist and fashion-mnist), random forest and deep forest can be faster.
139+
140+
.. _`Random Forest`: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
141+
142+
.. _`HGBDT`: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html
143+
144+
.. _`XGBoost EXACT`: https://xgboost.readthedocs.io/en/latest/index.html
145+
146+
.. _`XGBoost HIST`: https://xgboost.readthedocs.io/en/latest/index.html
147+
148+
.. _`LightGBM`: https://lightgbm.readthedocs.io/en/latest/
149+
150+
.. _`Repo`: https://github.com/xuyxu/deep_forest_benchmarks
151+
152+
.. _`ijcnn1`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#ijcnn1
153+
154+
.. _`pendigits`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#pendigits
155+
156+
.. _`letter`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#letter
157+
158+
.. _`connect-4`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#connect-4
159+
160+
.. _`sector`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#sector
161+
162+
.. _`covtype`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#covtype
163+
164+
.. _`susy`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#SUSY
165+
166+
.. _`higgs`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#HIGGS
167+
168+
.. _`usps`: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#usps
169+
170+
.. _`mnist`: https://keras.io/api/datasets/mnist/
171+
172+
.. _`fashion mnist`: https://keras.io/api/datasets/fashion_mnist/

0 commit comments

Comments
 (0)