OverviewTeaching: 15 min
Exercises: 30 minQuestions
- How can we find out when exactly a line of code was changed?
- How can we navigate past versions of the code?
- How can we find out which commit broke or changed a functionality?
- Quickly find a line of code, find out why it was introduced and when.
- Quickly find the commit that changed a behavior.
Please make sure that you do not clone repositories inside an already tracked folder:
$ git status
If you are inside an existing Git repository, step out of it. You need to find a different location since we will clone a new repository.
If you see this message, this is good in this case:
fatal: not a git repository (or any of the parent directories): .git
First the instructor demonstrates few commands on a real life example repository https://github.com/networkx/networkx (mentioned in the amazing site The Programming Historian). Later we will practice these in groups in an archaeology exercise (below). After demonstrating the command line tools, the instructor can also demonstrate searching, “show” and “annotate” in the GitHub browser for this example project.
git grepto search through the repository
git grep you can find all lines in a repository which contain some string or regular expression.
This is useful to find out where in the code some variable is used or some error message printed:
$ git grep sometext $ git grep "some text with spaces"
In the networkx repository you can try:
$ git clone https://github.com/networkx/networkx $ cd networkx $ git grep -i fixme
git log -Sto search through the history of changes
git grep searches the current state of the repository,
it is possible to search also through all changes for “sometext”:
$ git log -S sometext
In the networkx repository you can try:
$ git log -S test_weakly_connected_component
git showto inspect commits
We have seen this one before already. Using
git show we can inspect an individual commit if
we know its hash:
$ git show somehash
$ git show 759d589bdfa61aff99e0535938f14f67b01c83f7
git annotateto annotate code with commit metadata
Try it out on a file - with
git annotate you can see line by line who and when the line was modified
last. It also prints the precise hash of the last change which modified each line. Incredibly useful
$ git annotate somefile
$ git annotate networkx/convert_matrix.py
If you annotate in a terminal and the file is longer than the screen, Git by default uses the program
scroll the output.
<ENTER> to find “sometext” and you can cycle through the results with
n (next) and
You can also use page up/down to scroll. You can quit with
Discuss how these two affect the annotation:
- wrapping long lines of text/code into shorter lines
- autoformatting tools such as
git checkout -bto inspect code in the past
We can create branches pointing to a commit in the past. This is the recommended mechanism to inspect old code:
$ git checkout -b branchname somehash
# create branch called "older-code" from hash 347e6292419b $ git checkout -b older-code 347e6292419bd0e4bff077fe971f983932d7a0e9 # now you can navigate and inspect the code as it was back then # ... # after we are done we can switch back to "master" $ git checkout master # if we like we can delete the "older-code" branch $ git branch -d older-code
On newer Git versions this is the preferred command:
$ git switch --create branchname somehash
Exercise: basic archaeology commands
Let us explore the value of these commands in an exercise. Future exercises do not depend on this, so it is OK if you do not complete it fully.
In-person workshops: We recommend that you do this exercise in groups of two and discuss with your neighbors.
Video workshops: We will group you in breakout rooms of ~4 persons where you can work and discuss together. A helper or instructor will pop in to help out. In the group one participant can share their screen and others in the group help out, discuss, and try to follow along. Please write down questions in the collaborative notes. After 15-20 minutes we will bring you back into the main room and discuss.
- Clone this repository: https://github.com/tidyverse/rvest. Then step into the new directory:
$ git clone https://github.com/tidyverse/rvest.git $ cd rvest
Then using the above 5 tools try to:
- Find the code line which contains
"No links matched that expression".
- Find out when this line was last modified. Find the actual commit which modified that line. If this line got removed after we have created this exercise, find out when it was removed.
- Inspect that commit with
- Create a branch pointing to the past when that commit was created to be able to browse and use the code as it was back then.
- How would can you bring the code to the commit precisely before that line was last modified?
- We use
$ git grep "No links matched that expression"
This gives the output:
R/session.R: stop("No links matched that expression", call. = FALSE)
- We use
$ git annotate R/session.R
Then search for “No links matched” by typing “/No links matched” followed by Enter. The last commit that modified it was
5bbeb7df(unless that line changed since). If the line does not exist anymore, search for it using:
$ git log -S "No links matched that expression"
- We use
$ git show 5bbeb7df
- Create a branch pointing to that commit (here we called the branch “past-code”):
$ git branch past-code 5bbeb7df
- This is a compact way to access the first parent of
5bbeb7df(here we called the branch “just-before”):
$ git checkout -b just-before 5bbeb7df~1
But I am sure it used to work! Strange.
How would you solve this?
Before we go on first discuss how you would solve this problem: You know that it worked 500 commits ago but it does not work now.
- How would you find the commit which changed it?
- Why could it be useful to know the commit that changed it?
Video workshops: Write down ideas on how you would solve it in the collaborative note and we will discuss various approaches.
We will probably arrive at a solution which is similar to
$ git bisect start $ git bisect good f0ea950 # this is a commit that worked $ git bisect bad master # last commit is broken # now compile and/or run # after that decide whether $ git bisect good # or $ git bisect bad # now compile and/or run # after that decide whether $ git bisect good # or $ git bisect bad # iterate until commit is found
git bisect run <script>.
Exercise: Git bisect
In this exercise, we use
git bisecton an example repository. It is OK if you do not complete this exercise fully.
Begin by cloning https://github.com/coderefinery/git-bisect-exercise.
The motivation for this exercise is to be able to do archaeology with Git on a source code where the bug is difficult to see visually. Finding the offending commit is often more than half the debugging.
get_pi.pyapproximates pi using terms of the Nilakantha series. It should produce 3.14 but it does not. The script broke at some point and produces 3.57 using the last commit:
$ python get_pi.py 3.57
At some point within the 500 first commits, an error was introduced. The only thing we know is that the first commit worked correctly.
- Clone this repository and use
git bisectto find the commit which broke the computation (solution - spoiler alert!).
- Once you have found the offending commit, also practice navigating to the last good commit.
- Bonus exercise: Write a script that checks for a correct result and use
git bisect runto find the offending commit automatically (solution - spoiler alert!).
Video workshops: We will group you in breakout rooms of ~4 persons where you can work and discuss together. A helper or instructor will pop in to help out. Please write down questions in the collaborative notes. After 15-20 minutes we will bring you back into the main room and discuss.
Finding the first commit:
$ git log --oneline | tail -n 1
How to navigate to the parent of a commit with hash
# create branch pointing to the parent of somehash $ git checkout -b branchname somehash~1 # instead of a tilde you can also use this $ git checkout -b branchname somehash^
git log/grep/annotate/show/bisectis a powerful combination when doing archaeology in a project.
git checkout -b <name> <hash>is the recommended mechanism to inspect old code.
On newer Git you can use the more intuitive
git switch --create branchname somehash.