Sharing reproducible data science pipelines using Binder


  • Have a recipe to share a dynamic and reproducible visualization pipeline

Instructor note

  • 20 min presentation and discussion

[this lesson is adapted from]

Sharing dynamic notebooks using Binder

In an earlier episode we have created this notebook:

import pandas as pd
import seaborn as sns

# read dataset 1
url = ""
data_tromso = pd.read_csv(url, delimiter=";", decimal=",")

# read dataset 2
url = ""
data_oslo = pd.read_csv(url, delimiter=";", decimal=",")

# combine data
data = pd.concat([data_tromso, data_oslo], axis=0)

# replace to date format
data["date"] = pd.to_datetime(data["date"], format="%d.%m.%Y")

# replace "-" by 0
data['snow depth'] = pd.to_numeric(data['snow depth'], errors='coerce')

# finally the plot
plot = sns.relplot(x="date",
                   y="snow depth",

We will now first share it via GitHub “statically”, then using Binder.

Exercise/demo: Making your notebooks reproducible by anyone (15 min)

Instructor demonstrates this:

  • Creates a GitHub repository

  • Uploads the notebook file

  • Then we look at the statically rendered version of the notebook on GitHub

  • Create a requirements.txt file which contains:

  • Commit and push also this file to your notebook repository.

  • Visit

    Screenshot from user interface
  • Check that your notebook repository now has a “launch binder” badge in your file on GitHub.

  • Try clicking the button and see how your repository is launched on Binder (can take a minute or two). Your notebooks can now be expored and executed in the cloud.

  • Enjoy being fully reproducible!

How to get a digital object identifier (DOI)

  • You can get a DOI for any GitHub “release” using Zenodo.

  • Binder can also run notebooks from Zenodo.

  • It can be a very good idea to place your visualization pipeline on GitHub+Zenodo+Binder and in the supporting information of your paper to refer to its DOI.