Cookiecutter Data Science Industrialization

This project was initially started as a fork of the Cookiecutter Data Science project, which describes itself as "a logical, reasonably standardized, but flexible project structure for doing and sharing data science work".

It was then completed to include many of which are considered best practices to industrialize data science projects, such as unit and integration testing, CI/CD, workflow-as-code, packaging, etc.

Requirements to use the cookiecutter template:

Python 3
Cookiecutter Python package: this can be installed with pip by or conda depending on how you manage your Python packages:

$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:

cookiecutter https://github.com/Caffeinside/cookiecutter-data-science-indus

The resulting directory structure

The directory structure of your new project looks like this:

├── LICENSE
├── Makefile                        <- Makefile with useful commands
├── README.md                       <- The top-level README for developers using this project
├── airflow                         <- Target folder to generate Airflow DAGs
├── config.py                       <- The top-level config file for this project
│
├── data
│   ├── external                    <- Data from third party sources
│   ├── interim                     <- Intermediate data that has been transformed
│   ├── processed                   <- Final outputs of the worflows
│   ├── raw                         <- The original, immutable data dump
│   └── reference                   <- Reference or mapping data
│
├── deploy
│   ├── azure-cd-pipeline.yml       <- CD pipeline to retrieve Docker images and deploy the app on a remote server
│   └── azure-ci-pipeline.yml       <- CI pipeline to run tests, build and push Docker images to a registry
│
├── docker
│   ├── Dockerfile                  <- Simple DockerFile for the app
│   ├── docker-compose-dev.yml      <- Used to launch your services locally / in dev
│   └── docker-compose-prod.yml     <- Used to lauch your services in production
│
├── docs                            <- A default Sphinx project; see sphinx-doc.org for details
│
├── models                          <- Trained and serialized models
│
├── notebooks                       <- Jupyter notebooks. Naming convention is a number (for ordering),
│                                      the creator's initials, and a short `-` delimited description, e.g.
│                                      `1.0-jqp-initial-data-exploration`
│
├── pipeline
│   ├── predict.py                  <- ML prediction worflow
│   └── train.py                    <- ML training worflow
│
├── scripts                         <- Stand-alone scripts to perform specific tasks
│
├── setup.py                        <- Makes project pip installable (pip install -e .[tests]) so src can be imported 
│                                      and dependencies installed
│
├── src                             <- Source code for use in this project
│   ├── __init__.py                 <- Makes src a Python module
│   └── example.py
│
└── tests                           <- Unit and integrations tests
    └── test_example.py

Preliminary steps to set up your CI/CD

The template includes scripts that allow you to set up green CI and CD pipelines in minutes using Azure DevOps. Just follow these steps:

Create or log in your Azure personal account
Create a new project in Azure DevOps
Connect your Docker registry: in Project Settings >> Service Connections, create the connection. Use the name of this connection as the docker_registry_service_connection variable when you set up your template.
Connect your remote deployment server: in Project Settings >> Service Connections, create the connection. Use the name of this connection as the deploy_server_service_connection variable when you set up your template.
In Pipelines, create your CI and CD pipelines pointing out to the deploy/azure-ci-pipeline.yml and deploy/azure-cd-pipeline.yml files respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
tests		tests
{{ cookiecutter.repo_name }}		{{ cookiecutter.repo_name }}
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cookiecutter.json		cookiecutter.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cookiecutter Data Science Industrialization

Requirements to use the cookiecutter template:

To start a new project, run:

The resulting directory structure

Preliminary steps to set up your CI/CD

About

Uh oh!

Languages

License

bsaintot/cookiecutter-data-science-indus

Folders and files

Latest commit

History

Repository files navigation

Cookiecutter Data Science Industrialization

Requirements to use the cookiecutter template:

To start a new project, run:

The resulting directory structure

Preliminary steps to set up your CI/CD

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages