Organizing projects
Objectives
How to move common reusable functions to modules and import them
Know about good practices for notebooks to make them reusable
Instructor note
20 min presentation and discussion
[this lesson is adapted from https://coderefinery.github.io/jupyter/06-sharing/]
It all started with a short and simple notebook but how to organize as projects and notebooks grow?
Avoiding repetitive code
Let’s imagine we wrote this function fancy_plot
for a hexagonal 2D histogram plot
(please try it in your notebook):
import seaborn as sns
# to get some random numbers
from numpy.random import default_rng
# this one is simple but let us imagine something very lengthy
def fancy_plot(x_values, y_values, color):
"""
Fancy function creating fancy plots.
"""
sns.jointplot(x=x_values, y=y_values, kind="hex", color=color)
rng = default_rng()
x_values = rng.standard_normal(500)
y_values = rng.standard_normal(500)
# call our function
fancy_plot(x_values, y_values, "#4cb391")
other_x_values = rng.standard_normal(500)
other_y_values = rng.standard_normal(500)
# call our function again, this time with other data
fancy_plot(other_x_values, other_y_values, "#fc9272")
Now we would like to use this function in 5 other notebooks without duplicating it over all of the notebooks (imagine the function is very lengthy).
It can be useful to create a Python file myplotfunctions.py
in the same
folder as the notebooks (you can change the name)
and place this code into myplotfunctions.py
:
import seaborn as sns
# this one is simple but let us imagine something very lengthy
def fancy_plot(x_values, y_values, color):
"""
Fancy function creating fancy plots.
"""
sns.jointplot(x=x_values, y=y_values, kind="hex", color=color)
Now we can simplify our notebook:
# to get some random numbers
from numpy.random import default_rng
from myplotfunctions import fancy_plot
rng = default_rng()
x_values = rng.standard_normal(500)
y_values = rng.standard_normal(500)
# call our function
fancy_plot(x_values, y_values, "#4cb391")
other_x_values = rng.standard_normal(500)
other_y_values = rng.standard_normal(500)
# call our function again, this time with other data
fancy_plot(other_x_values, other_y_values, "#fc9272")
Document dependencies
If you import libraries into your notebook, note down their versions.
It is customary to do this either in a requirements.txt
file (example):
pandas==1.2.3
seaborn==0.11.1
… or in an environment.yml
file (example):
name: example-environment
channels:
- conda-forge
dependencies:
- pandas==1.2.3
- seaborn==0.11.1
Place either requirements.txt
or environment.yml
in the same folder as the notebook(s).
This is not only useful for people who will try to rerun this in future, it is also understood by some tools (e.g. Binder) which we will see later.
Recommendations for longer notebooks
Create a table of contents on top
You can do that using Markdown. This produces a nice overview for longer notebooks. Example: https://stackoverflow.com/a/39817243
How to make it possible to toggle showing code
It is possible to hide all the code and only show the output. This can be nice for notebook readers who don’t need/want to see the code:
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')