Notebooks and version control
Objectives
Demonstrate two tools which make version control of notebooks easier.
Instructor note
10 min teaching
5 min demo
Jupyter Notebooks are stored in JSON format. With this format it can be a bit difficult to compare and merge changes which are introduced through the notebook interface.
Packages and JupyterLab extensions to simplify version control
Several packages and JupyterLab extensions have been developed to make it easier to interact with Git and GitHub:
nbdime (notebook “diff” and “merge”) provides “content-aware” diffing and merging.
Adds a Git button to the notebook interface.
git diff
andgit merge
shell commands can use nbdime’s diff and merge for notebook files, but leave Git’s behavior unchanged for non-notebook files.
jupyterlab-git is a JupyterLab extension for version control using Git.
Adds a Git tab to the left-side menu bar for version control inside JupyterLab.
JupyterLab GitHub is a JupyterLab extension for accessing GitHub repositories.
Adds a GitHub tab to the left-side menu bar where you can browse and open notebooks from your GitHub repositories.
All three extensions can be used from within the JupyterLab interface and our Conda environment provides jupyterlab-git and nbdime. To install additional extensions, please consult the official documentation about installing and managing JupyterLab extensions.
Comparing Jupyter Notebooks on GitHub
For this you really want to enable Rich Jupyter Notebook Diffs on GitHub:
On GitHub click on your avatar/image (top right).
Click on “Feature preview”.
Enable “Rich Jupyter Notebook Diffs”.
To demonstrate the difference we have created a small change and you can try to compare the effect yourself by enabling/disabling the feature: https://github.com/coderefinery/jupyter/compare/5ff55b8..fce21e6
Here is the diff without “Rich Jupyter Notebook Diffs”:
Here is the same change, but this time with “Rich Jupyter Notebook Diffs” enabled:
Comparing changes locally without jupyterlab-git/nbdime
Instructor note
Create a new folder
Initialize a new Git repository (which is anyway good to demonstrate)
Copy the “darts” notebook into it (from the previous episode)
Add
.ipynb_checkpoints/
to.gitignore
Stage and commit the file before trying the changes below
Instructor demonstrates a plain git diff
To understand the problem, the instructor first shows the example notebook and then the source code in JSON format.
Then we introduce a simple change to the example notebook, for instance changing colors (change “red” and “blue” to something else) and also changing dimensions in
fig.set_size_inches(6.0, 6.0)
.Run all cells.
We save the change (save icon) and in the JupyterLab terminal try a “normal”
git diff
and see that this is not very useful. Discuss why.
Comparing changes with jupyterlab-git/nbdime
Let us inspect the same changes using jupyterlab-git (which uses nbdime). This is more convenient since it highlights only the changes that we have made:
Using nbdime on the command line
You can configure your (command line) Git to always use nbdime when comparing and merging notebooks:
$ nbdime config-git --enable --global
Now when you do git diff or git merge with notebooks, you should see a nice diff view. For more information please see the corresponding documentation.
See also
nbdev developed by fast.ai is a notebook-driven development platform which includes support for git-friendly Jupyter notebooks
Verdant is a JupyterLab extension that automatically records history of all experiments you run in a Jupyter notebook, and stores them in an
.ipyhistory
JSON file.