Data visualization using Python

In this introductory-level workshop, we will learn to produce reproducible data visualization pipelines using the Python programming language.

Who is the course for?

This course is about reading data from a file, processing the data, plotting the result, and all of this in a reproducible way. This is something that many of us need to do in our research. The content of this course is general and should be relevant for anyone working in the field of science.

Typical audience:

  • Somebody starting with Python or curious about Python.

  • Somebody who needs to read, process, and plot data for their work or studies and would like to try it out with Python.

  • Persons who already use Python for this but want to learn about libraries to simplify common tasks and about how to share their workflow in a reproducible way.

Prerequisites

Preparations for you:

Preparations for your computer:

About the course

We will work in Jupyter notebooks and start with Python basics, to be able to read data from Excel sheets and comma-separated values (CSV) files. We will introduce the pandas library for “data wrangling” (reading, writing, sorting, and filtering of data).

We will learn how to process data and compute simple statistics, error bars, and regression approximations with Python and the help of its libraries.

And finally we will learn how to produce reproducible plots using the libraries Matplotlib, Seaborn, and Altair (you can then choose your favorite one). We will practice how to share these visualization pipelines using Binder via GitHub.

What is not taught?

  • Version control. Although super useful it is outside of this workshop.

  • Python outside a Jupyter notebook.

  • Python sets and tuples are only mentioned.

  • File input/output is only used via libraries and doing “own” file-I/O is only part of optional material.

  • How to choose the right visualization format for the data at hand.

  • Python object oriented design.

  • Python packaging.

  • NumPy arrays.

  • Managing environments and installing Python packages.

Progression

  • Getting used to the Jupyter environment

  • Getting started with Python

  • Producing a first plot

  • Reading data from file and web

  • Dealing with “messy” data

  • Improving the plots

  • Organizing projects as they grow

  • Sharing reproducible plots