Starting with an example code in a Notebook

Data

The file weather_data.csv (raw csv file) contains hourly weather measurements from the observation station “Vantaa Helsinki-Vantaan lentoasema” (Helsinki airport) during 2024.

Origin of the data

Data obtained from https://en.ilmatieteenlaitos.fi/download-observations#!/ on 2025-09-18.

Data has been provided by the Finnish Meteorological Institute under the Creative Commons Attribution 4.0 International license (CC BY 4.0): https://en.ilmatieteenlaitos.fi/open-data-licence

Our initial goal

Our initial goal for this exercise is to plot a series of temperatures and precipitations for January and to compute and plot the mean temperature averaged over the month. We imagine that we assemble a working script from various internet research/ AI chat recommendations and arrive at:

import pandas as pd
import matplotlib.pyplot as plt


# read data
data = pd.read_csv("weather_data.csv")

# combine 'date' and 'time' into a single datetime column
data["datetime"] = pd.to_datetime(data["date"] + " " + data["time"])

# set datetime as index for convenience
data = data.set_index("datetime")

# keep only january data
january = data.loc["2024-01"]

fig, ax = plt.subplots()

# temperature time series
ax.plot(
    january.index,
    january["air_temperature_celsius"],
    label="air temperature (C)",
    color="red",
)

ax.set_title("air temperature (C) at Helsinki airport")
ax.set_xlabel("date and time")
ax.set_ylabel("air temperature (C)")
ax.legend()
ax.grid(True)

# format x-axis for better date display
fig.autofmt_xdate()

fig.savefig("2024-01-temperature.png")

This example is in Python but we will try to see “through” the code and focus on the bigger picture and hopefully manage to imagine other languages in its place. For the Python experts: we will not see the most elegant Python.

Further goals

  • Once we get this working for January, our task changes to also plot the February and the March in two additional plots.

  • Later, we wish to generalize the code so that a user can compute and plot this for any month, without changing the code (with a command line interface).

How we plan to solve it

Before we attempt to do this, we discuss with workshop participants how they would tackle this problem.

Together we improve the code based on suggestions from learners towards more modularity and re-usability.

Instructor note

Participants give suggestions and ask questions via collaborative document and instructor(s) try to follow and answer. They can also roughly follow the ideas and steps in the One possible solution.

It is OK and good if mistakes happen and it is fun if the instructor(s) can convey a bit of “improv” feel to this lesson.

Learning outcomes

  • Know about pure functions (functions without side effects, functions which given same input always return same output).

  • Learn why and how to limit side effects of functions.

  • Discuss why and how to limit side effects of data. Also discuss when mutable data may be preferable.

  • The Zen of Python

  • Discuss why single-purpose functions are often preferred over multi-purpose functions.

  • Split-apply-combine, which lets you more easily parallelize. Make your code modular in a way that lets you split the steps and parallelize.

  • Think about global vs local data structures. It is not easy to separate them right.

  • Understand how a command line interface to a code can improve usability and also make the code more versatile (to be combined with workflow management tools).

  • Connect modular code development to the remaining lessons (version control, testing, documentation, reproducibility).