Motivation#

Objectives

  • Make sure nobody leaves the workshop without starting to use some form of version control.

  • Discuss the reasons why we advocate distributed version control.

Instructor note

  • 15 min teaching/demonstration

The essence of version control#

  • System which records snapshots of a project

  • Implements branching:

    • You can work on several feature branches and switch between them

    • Different people can work on the same code/project without interfering

    • You can experiment with an idea and discard it if it turns out to be a bad idea

  • Implements merging:

    • Person A and B’s simultaneous work can be easily combined

What we typically like to snapshot#

  • Software (this is how it started but Git/GitHub can track a lot more)

  • Scripts

  • Documents (plain text files much better suitable than Word documents)

  • Manuscripts (Git is great for collaborating/sharing LaTeX or Quarto manuscripts)

  • Configuration files

  • Website sources

  • Data

Discussion

Discuss the following directory listing. What possible problems do you anticipate with this kind of “version control”:

mylib-1.2.4_18.3.07.tgz         somecode_CP_10.8.07.tgz
mylib-1.2.4_27.7.07.tgz         somecode_CP_17.5.07.tgz
mylib-1.2.4_29.4.08.tgz         somecode_CP_23.8.07_final.tgz
mylib-1.2.4_6.10.07.tgz         somecode_CP_24.5.07.tgz
mylib-1.2.5_23.4.08.tgz         somecode_CP_25.5.07.tgz
mylib-1.2.5_25.5.07.tgz         somecode_CP_29.5.07.tgz
mylib-1.2.5_6.6.07.tgz          somecode_CP_30.5.07.tgz
mylib-1.2.5_bexc.tgz            somecode_CP_6.10.07.tgz
mylib-1.2.5_d0.tgz              somecode_CP_6.6.07.tgz
mylib-1.3.0_4.4.08.tgz          somecode_CP_8.6.07.tgz
mylib-1.3.1_4.4.08.tgz          somecode_KT.tgz
mylib-1.3.2_22.4.08.tgz         somecode_PI1_2007.tgz
mylib-1.3.2_4.4.08.tgz          somecode_PI_2007.tgz
mylib-1.3.2_5.4.08.tgz          somecode_PI2_2007.tgz
mylib-1.3.3_1.5.08.tgz          somecode_PI_CP_18.3.07.tgz
mylib-1.3.3_20.5.08.tgz         somecode_11.5.08.tgz
mylib-1.3.3_tstrm_27.6.08.tgz   somecode_15.4.08.tgz
mylib-1.3.3_wk_10.8.08.tgz      somecode_17.6.09_unfinished.tgz
mylib-1.3.3_wk_11.8.08.tgz      somecode_19.7.09.tgz
mylib-1.3.3_wk_13.8.08.tgz      somecode-20.7.09.tgz
...

Why version control#

Roll-back functionality#

  • Mistakes happen - without recorded snapshots you cannot easily undo mistakes and go back to a working version.

Branching#

  • Often you need to work on several issues/features in one code - without branching this can be messy and confusing.

  • You can simulate branching by copying the entire code to multiple places but also this will be messy and confusing.

Collaboration#

With version control, none of these are needed anymore (or have much simpler answers):

  • “I will just finish my work and then you can start with your changes.”

  • “Can you please send me the latest version?”

  • “You never got the code I send by email? Maybe the spam filter marked it as malicious?”

  • “Where is the latest version?”

  • “Which version are you using?”

  • “Which version have the authors used in the paper I am trying to reproduce?”

Reproducibility#

  • How do you indicate which version of your code you have used in your paper?

  • When you find a bug, how do you know when precisely this bug was introduced (Are published results affected? Do you need to inform collaborators or users of your code?).

Compare with Dropbox or Google Drive#

  • Document/code is in one place, no need to email snapshots.

  • How can you use an old version? Possible to get old versions but in a much less useful way - snapshots of files, not directories.

  • What if you want to work on multiple versions at the same time? Do you make a copy? How do you merge copies?

  • What if you don’t have internet?

Why Git?

We will use Git to record snapshots of our work:

  • Easy to set up: no server needed.

  • Very popular: chances are high you will need to contribute to somebody else’s code which is tracked with Git.

  • Distributed: good backup, no single point of failure, you can track and clean-up changes offline, simplifies collaboration model for open-source projects.

  • Important platforms such as GitHub, GitLab, and Bitbucket build on top of Git.

However, any version control is better than no version control and it is OK to prefer a different tool than Git.

Other tools:

Interesting newcomer:

Difficulties of version control#

Despite the benefits, let’s be honest, there are some difficulties:

  • One more thing to learn (it’s probably worth it and will save you more time in the long run; basic career skill).

  • Difficult if some people don’t want to use it (in the worst case, you can version control on your side and send them versions).

  • Advanced things can be difficult, a bit too many gotchas (basics are often enough, ask others for help when needed).

A real-life example#

Before we create a new repository from scratch and learn how to record changes and create and merge branches, let us explore an existing Git repository on GitHub. The goal here is not to teach GitHub yet (we will explain some of the concepts later), but rather to get a glimpse of the wider picture and see the social aspect to know what our end goal is.

As an example we can explore a famous Git repository which was used to produce the Event Horizon Telescope images: achael/eht-imaging.

While some of these are GitHub features, it all can be done on other sites, or by yourself without GitHub at all.