Sharing reproducible data science pipelines using Binder
Objectives
Have a recipe to share a dynamic and reproducible visualization pipeline
Instructor note
20 min presentation and discussion
[this lesson is adapted from https://coderefinery.github.io/jupyter/06-sharing/]
Sharing dynamic notebooks using Binder
In an earlier episode we have created this notebook:
import pandas as pd
import seaborn as sns
# read dataset 1
url = "https://raw.githubusercontent.com/coderefinery/data-visualization-python/main/data/tromso.csv"
data_tromso = pd.read_csv(url, delimiter=";", decimal=",")
# read dataset 2
url = "https://raw.githubusercontent.com/coderefinery/data-visualization-python/main/data/oslo.csv"
data_oslo = pd.read_csv(url, delimiter=";", decimal=",")
# combine data
data = pd.concat([data_tromso, data_oslo], axis=0)
# replace dd.mm.yyyy to date format
data["date"] = pd.to_datetime(data["date"], format="%d.%m.%Y")
# replace "-" by 0
data['snow depth'] = pd.to_numeric(data['snow depth'], errors='coerce')
# finally the plot
plot = sns.relplot(x="date",
y="snow depth",
hue="name",
col="name",
kind="scatter",
data=data)
We will now first share it via GitHub “statically”, then using Binder.
Exercise/demo: Making your notebooks reproducible by anyone (15 min)
Instructor demonstrates this:
Creates a GitHub repository
Uploads the notebook file
Then we look at the statically rendered version of the notebook on GitHub
Create a
requirements.txt
file which contains:pandas==1.2.3 seaborn==0.11.1
Commit and push also this file to your notebook repository.
Visit https://mybinder.org:
Check that your notebook repository now has a “launch binder” badge in your
README.md
file on GitHub.Try clicking the button and see how your repository is launched on Binder (can take a minute or two). Your notebooks can now be expored and executed in the cloud.
Enjoy being fully reproducible!