Version control (motivation)
Objectives
Browse commits and branches of a Git repository.
Remember that commits are like snapshots of the repository at a certain point in time.
Know the difference between Git (something that tracks changes) and GitHub/GitLab (a web platform to host Git repositories).
Why do we need to keep track of versions?
Version control is an answer to the following questions (do you recognize some of them?):
“It broke … hopefully I have a working version somewhere?”
“Can you please send me the latest version?”
“Where is the latest version?”
“Which version are you using?”
“Which version have the authors used in the paper I am trying to reproduce?”
“Found a bug! Since when was it there?”
“I am sure it used to work. When did it change?”
“My laptop is gone. Is my thesis now gone?”
Demonstration
Example repository: https://github.com/workshop-material/planets
Commits are like snapshots and if we break something we can go back to a previous snapshot.
Commits carry metadata about changes: author, date, commit message, and a checksum.
Branches are like parallel universes where you can experiment with changes without affecting the default branch: https://github.com/workshop-material/planets/network (“Insights” -> “Network”)
With version control we can annotate code (example).
Collaboration: We can fork (make a copy on GitHub), clone (make a copy to our computer), review, compare, share, and discuss.
Code review: Others can suggest changes using pull requests or merge requests. These can be reviewed and discussed before they are merged. Conceptually, they are similar to “suggesting changes” in Google Docs.
Features: roll-back, branching, merging, collaboration
Roll-back: you can always go back to a previous version and compare
Branching and merging:
Work on different ideas at the same time
You can experiment with an idea and discard it if it turns out to be a bad idea
Different people can work on the same code/project without interfering
Collaboration: review, compare, share, discuss
Talking about code
Which of these two is more practical?
“Clone the code, go to the file ‘simulate.py’, and search for ‘force_between_planets’. Oh! But make sure you use the version from September 2024.”
Or I can send you a permalink: https://github.com/workshop-material/planets/blob/1343ac0/simulate.py#L31C5-L39
What we typically like to snapshot
Software (this is how it started but Git/GitHub can track a lot more)
Scripts
Documents (plain text files much better suitable than Word documents)
Manuscripts (Git is great for collaborating/sharing LaTeX or Quarto manuscripts)
Configuration files
Website sources
Data
Discussion
In this example somebody tried to keep track of versions without a version control system tool like Git. Discuss the following directory listing. What possible problems do you anticipate with this kind of “version control”:
myproject-2019.zip
myproject-2020-february.zip
myproject-2021-august.zip
myproject-2023-09-19-working.zip
myproject-2023-09-21.zip
myproject-2023-09-21-test.zip
myproject-2023-09-21-myversion.zip
myproject-2023-09-21-newfeature.zip
...
(100 more files like these)
Solution
Giving a version to a collaborator and merging changes later with own changes sounds like lots of work.
What if you discover a bug and want to know since when the bug existed?
Where to learn more
CodeRefinery lessons with focus on collaboration and not only for the command line:
Introduction to version control (day 1-2): Why we want to track versions and how to go back in time to a working version. This lesson brings you from zero to using Git and GitHub for own projects.
Collaborative distributed version control (day 3): This lesson builds on “Introduction to version control” and we apply branching and learn about pull requests (merge requests), forks, and collaboration using Git and GitHub.