P.S. Looking for the code? Available on github: papermill-mlflow
Your company (e.g., an e-commerce platform across several countries) is starting a new project on fraud detection. You begin by building a basic machine learning pipeline for a single country in a Jupyter notebook. The evaluation metrics for your basic, single-country pipeline looks good.
Next, you want to apply the same pipeline to the other countries—since the data format is identical—and run multiple experiments (e.g., feature selection, parameter tuning).
Ideally, each experiment’s output should be a self-contained Jupyter notebook for easy reference. You also want to store and access artifacts (e.g., visualizations, trained models) in a single location.
How should you do it?
If you’re like me, you might duplicate that basic pipeline (basic.ipynb
) and rename it (e.g., basic_sg.ipynb
, basic_vn.ipynb
, etc). One notebook per country.
However, as you experiment, you find some new features that improve results and want to replicate it across the countries. Thus, you copy-paste code across multiple notebooks. This violates the DRY principle and is pretty tedious.
To log the results in a single location, you output evaluation metrics for each experiment in a CSV. Visualizations (e.g, ROC curves, precision-recall curves) and trained model binaries are also stored in a single directory.
However, trying to match the visualizations (in the directory) to the experiment results (in the CSV) for reference is time-consuming. Ditto for the model binary.
Is there a simpler, more effortless way?
After some trial and error, I’ve settled on a workflow for simpler, faster experimentation. No more duplication of notebooks. All metrics, visualizations, and model binaries in a single UI.
There are three main components:
jupyter
: Quick, iterative development and visualization of code and outputpapermill
: Running one notebook with different parameters; output into separate notebooksmlflow
: Logging of metrics and artifacts within a single UITo demonstrate this, we’ll do the following:
Let’s get started.
In this notebook we have a basic pipeline doing some analysis, visualizations, feature engineering, and machine learning. At a high level, it:
The pipeline is simple—running it end-to-end takes 3.5 seconds. This allows for rapid, iterative experimentations.
More experiments = more learning = high probability of success.
Here’s how the notebook looks like: base.ipynb
To scale our single notebook (running the S&P 500 pipeline) to multiple other stock indices, we enlist the help of papermill
.
Papermill allows you to parametrise and execute Jupyter notebooks. In this demo, we’ll set different parameters (e.g., INDEX = GOLD
, INDEX = NIKKEI
) for the same basic.ipynb
notebook to experiment our pipeline on different stock indices. Each experiment is also saved to its own notebook (e.g., basic_SNP.ipynb
, basic_GOLD.ipynb
).
Using papermill
to do this is easy—just two simple steps.
First, add a parameters
tag to the cell in your notebook:
parameters
in the textbook on the top right of the cellHere, we’ve parameterized the third cell so we can set INDEX
externally via papermill
.
Next, we create a notebook (runner.ipynb
) to run basic.ipynb
with different parameters. Here’s a code snippet on how to do this, where we:
basic.ipynb
with different indices (via the parameter
argument)basic_SNP.ipynb
, ‘)for index in ['SNP', 'GOLD', 'SSE', 'HANGSENG', 'NIKKEI']:
logger.info('Running notebook for: {}'.format(index))
pm.execute_notebook(input_path='basic.ipynb',
parameters={'INDEX': index}
output_path='../artifact_dir/notebooks/basic_{}.ipynb'.format(index))
Now, wasn’t that simple?
With this approach, we minimize code duplication. Also, each experiment’s code, visualizations, and evaluation metrics are logged in a notebook, allowing easy reference and replication.
Nonetheless, how can we get an overall view of experiment results, without going through each notebook? Is there a way to group and store each experiments’ evaluation metrics, visualizations, and trained models? Can this then be consolidated and accessed from a single location?
Each notebook trains five ML models and evaluates them on four metrics. For each model, we produce two graphs (i.e., ROC curve, precision-recall curve) and a model binary.
That’s 20 metrics, 10 graphs, and 5 model binaries per experiment run—this can quickly get out of hand.
To get this under control, we can use mlflow
.
MLflow is a framework that helps with tracking experiments and ensuring reproducible workflows for deployment. It has three components (tracking, projects, models). This walkthrough will focus on the first which has an API and UI for logging parameters, metrics, artifacts, etc.
Automatically logging each experiment is easy with this snippet of code:
In the above, with every ML model trained, we log parameters (e.g., stock index, model name, secret sauce), metrics (e.g., AUC, precision), and artefacts (e.g., visualisations, model binaries). This is pretty basic and you can do it yourself with a bit of python code.
Where mlflow
shines is its server and UI. Starting up the server (mlflow server
) and navigating to the dashboard (127.0.0.1:5000
) is easy. (The server can also be hosted centrally and users can push their metrics and artifacts to it.)
The MLflow UI consolidates all your parameters and metrics on a single homepage for easy viewing.
You can sort experiments by metrics or parameters. Here, we sort it by AUC descending.
It also allows customer filters via parameters and metrics. Here’s we filter for Logistic Regression
.
But where are the artifacts from each experiment? Click on the experiment and you’ll be brought to the experiment page where you can view all your artifacts and download them.
In the walkthrough above, we saw how we could run multiple experiments from a single notebook (with papermill
), and log the results and artifacts in a single UI (with mlflow
).
Here’s the git repo to the notebooks used in the walkthrough. Clone it and try it. Give it a star if you found it useful. Actively try to integrate papermill
and mlflow
into your work.
Automate your experimentation workflow to minimize effort and iterate faster, increasing your chances of success.
Thanks to Gabriel Chuan and Michael Ng for reading drafts of this.
If you found this useful, please cite this write-up as:
Yan, Ziyou. (Mar 2020). Simpler Experimentation with Jupyter, Papermill, and MLflow. eugeneyan.com. https://eugeneyan.com/writing/experimentation-workflow-with-jupyter-papermill-mlflow/.
or
@article{yan2020automate,
title = {Simpler Experimentation with Jupyter, Papermill, and MLflow},
author = {Yan, Ziyou},
journal = {eugeneyan.com},
year = {2020},
month = {Mar},
url = {https://eugeneyan.com/writing/experimentation-workflow-with-jupyter-papermill-mlflow/}
}
Join 9,800+ readers getting updates on machine learning, RecSys, LLMs, and engineering.