Motivation

Objectives

  • Make sure nobody leaves the workshop without starting to use some form of version control.

  • Discuss the reasons why we advocate distributed version control.

Instructor note

  • 15 min discussion/demonstration

Git is all about keeping track of changes

We will learn how to keep track of changes first in the web browser. Below are screenshots of tracked changes with Git (from this example repository):

Screenshot of a git log on GitHub

Web browser, GitHub view

Later also using the terminal or the editor (the same example repository):

Screenshot of a git log in terminal

The same as above, but the terminal view

Why do we need to keep track of versions?

Problem: If you have to identify and find your code from 17 days ago, can you?

Version control is an answer to the following questions (do you recognize some of them?):

  • “It broke … hopefully I have a working version somewhere?”

  • “Can you please send me the latest version?”

  • “Where is the latest version?”

  • “Which version are you using?”

  • “Which version have the authors used in the paper I am trying to reproduce?”

  • “Found a bug! Since when was it there?”

  • “I am sure it used to work. When did it change?”

  • “My laptop is gone. Is my thesis now gone?”

Features: roll-back, branching, merging, collaboration

Problem: Your code worked two days ago, but is giving an error now. You don’t know what you changed.

Problem: You and your colleague want to work on the same code at the same time.

  • Roll-back: you can always go back to a previous version and compare

  • Branching and merging:

    • Work on different ideas at the same time

    • Different people can work on the same code/project without interfering

    • You can experiment with an idea and discard it if it turns out to be a bad idea

Branching explained with a gopher

Image created using https://gopherize.me/ (inspiration).

Reproducibility

Problem: Someone asks you about your results from 5 years ago. Can you get the same results now?

  • How do you indicate which version of your code you have used in your paper?

  • When you find a bug, how do you know when precisely this bug was introduced (Are published results affected? Do you need to inform collaborators or users of your code?).

With version control we can “annotate” code (browse this example online):

Example of a git-annotated code with code and history side-by-side

Example of a git-annotated code with code and history side-by-side.

Talking about code

Problem: You want to show someone a few lines from one of your projects.

Which of these two is more practical?

  • “Clone the code, go to the file ‘src/util.rs’, and search for ‘time_iso8601’”. Oh! But make sure you use the version from August 2023.”

  • Or I can send you a permalink:

Screen-shot of a code portion

Permalink that points to a code portion.

What we typically like to snapshot

  • Software (this is how it started but Git/GitHub can track a lot more)

  • Scripts

  • Documents (plain text files much better suitable than Word documents)

  • Manuscripts (Git is great for collaborating/sharing LaTeX or Quarto manuscripts)

  • Configuration files

  • Website sources

  • Data

Discussion

In this example somebody tried to keep track of versions without a version control system tool like Git. Discuss the following directory listing. What possible problems do you anticipate with this kind of “version control”:

myproject-2019.zip
myproject-2020-February.zip
myproject-2021-August.zip
myproject-2023-09-19-working.zip
myproject-2023-09-21.zip
myproject-2023-09-21-test.zip
myproject-2023-09-21-myversion.zip
myproject-2023-09-21-newfeature.zip
...
(100 more files like these)

Difficulties of version control

Despite the benefits, let’s be honest, there are some difficulties:

  • One more thing to learn (it’s probably worth it and will save you more time in the long run; basic career skill).

  • Difficult if your collaborators don’t want to use it (in the worst case, you can version control on your side and email them versions).

  • Advanced things can be difficult, but basics are often enough (ask others for help when needed).

Why Git and not another tool?

  • Easy to set up: no server needed.

  • Very popular: chances are high you will need to contribute to somebody else’s code which is tracked with Git.

  • Distributed: good backup, no single point of failure, you can track and clean-up changes offline, simplifies collaboration model for open-source projects.

  • Important platforms such as GitHub, GitLab, and Bitbucket build on top of Git.

However, any version control is better than no version control and it is OK to prefer a different tool than Git such as Subversion, Mercurial, Pijul, or others.