Sharing plots and notebooks
Objectives
Know about good practices for notebooks to make them reusable
Have a recipe to share a dynamic and reproducible visualization pipeline
[this lesson is adapted after https://coderefinery.github.io/jupyter/sharing/]
Document dependencies
If you import libraries into your notebook, note down their versions.
In Python, it is customary to do this either in a requirements.txt
file (example):
jupyterlab
altair == 5.5.0
vega_datasets
pandas == 2.2.3
numpy == 2.1.2
… or in an environment.yml
file (example):
name: data-viz
channels:
- conda-forge
dependencies:
- python <= 3.12
- jupyterlab
- altair-all = 5.5.0
- vega_datasets
- pandas = 2.2.3
- numpy = 2.1.2
By the way, this is almost the same environment.yml
file that we used to install the local software
environment in the Software install instructions (the latter did not pin versions).
Place either requirements.txt
or environment.yml
in the same folder as the notebook(s).
This is not only useful for people who will try to rerun this in future, it is also understood by some tools (e.g. Binder) which we will see later.
Different ways to share a Vega-Altair plot
Save it in SVG format (vector graphics, “maximum resolution”)
Save it in PNG format (raster graphics)
Share it as notebook (more about it below)
Save it a web page with
chart.save("chart.html")
and share the HTML fileYou can also get a shareable URL to a chart (example)
With sensitive data, you need to be careful with sharing (see next section)
Vega-Altair and notebooks containing sensitive data
If you plot sensitive data in a notebook with Vega-Altair, you need to be careful.
The author of Vega-Altair provided a good summary in this GitHub comment:
“Standard Altair rendering requires the entire dataset to be accessible to the viewer’s browser: this is a fundamental design decision in Vega/Vega-Lite, in which a chart is equivalent to a dataset plus a specification of how to render it. In general, you should assume that the entire contents of any dataframe you pass to the alt.Chart() object will be saved in the notebook and be inspectable by the viewer.”
“One way to get around this would be to render the chart server-side, export a PNG, and display this png instead of the live chart. Incidentally, in the Jupyter notebook you can do this by running:”
alt.renderers.enable('png')
“This sets up Altair such that charts will be rendered to PNG within the kernel, and only that PNG rendering will be embedded in the notebook. Note this requires some extra dependencies, described here.”
“But even here, I wouldn’t call your data “private” (for example, if you save a scatter plot to PNG, a user can straightforwardly read the data values off the chart!) So this makes me think you’re actually doing some sort of aggregation of your data before plotting (e.g. showing a histogram). If this is the case, I would suggest doing those aggregations outside of Altair using e.g. pandas, and then passing the aggregated dataset to the chart. Then you get the normal interactive display of the Altair chart, and your data is just as private as it would have been in the equivalent static rendering – the user can only see the aggregated values you supplied to the chart.”
Different ways to share a notebook
We need to learn how to share notebooks. At the minimum we need to share them with our future selves (backup and reproducibility).
You can enter a URL, GitHub repo or username, or GIST ID in nbviewer and view a rendered Jupyter notebook
Read the Docs can render Jupyter Notebooks via the nbsphinx package
Binder creates live notebooks based on a GitHub repository
EGI Notebooks (see also https://egi-notebooks.readthedocs.io)
JupyterLab supports sharing and collaborative editing of notebooks via Google Drive. Recently it also added support for Shared editing with collaborative notebook model.
JupyterLite creates a Jupyterlab environment in the browser and can be hosted as a GitHub page.
Notedown, Jupinx and DocOnce can take Markdown or Sphinx files and generate Jupyter Notebooks
Voilà allows you to convert a Jupyter Notebook into an interactive dashboard
The
jupyter nbconvert
tool can convert a (.ipynb
) notebook file to:python code (
.py
file)an HTML file
a LaTeX file
a PDF file
a slide-show in the browser
The following platforms can be used free of charge but have paid subscriptions for faster access to cloud resources:
CoCalc (formerly SageMathCloud) allows collaborative editing of notebooks in the cloud
Google Colab lets you work on notebooks in the cloud, and you can read and write to notebook files on Drive
Microsoft Azure Notebooks also offers free notebooks in the cloud
Deepnote allows real-time collaboration
Sharing dynamic notebooks using Binder
Exercise/demo: Making your notebooks reproducible by anyone (15 min)
Instructor demonstrates this:
Instructor creates a GitHub repository.
Uploads a notebook file that we created in earlier episodes.
Then we look at the statically rendered version of the notebook on GitHub and also nbviewer.
Add a file
requirements.txt
which contains:altair == 5.5.0 vega_datasets pandas == 2.2.3 numpy == 2.1.2
Visit https://mybinder.org:
Check that your notebook repository now has a “launch binder” badge in your
README.md
file on GitHub.Try clicking the button and see how your repository is launched on Binder (can take a minute or two). Your notebooks can now be explored and executed in the cloud.
Enjoy being fully reproducible!
Also please see how we share the notebooks from this lesson in the Episode overview.
How to get a digital object identifier (DOI)
Zenodo is a great service to get a DOI for a notebook (but first practice with the Zenodo sandbox).
Binder can also run notebooks from Zenodo.
In the supporting information of your paper you can refer to its DOI.