Git under the hood
Objectives
Verify that branches are pointers to commits and extremely lightweight.
Instructor note
10 min teaching/type-along
15 min exercise
Down the rabbit hole
When working with Git, you will never need to go inside .git, but in this exercise we will, in order to learn about how branches are implemented in Git.
For this exercise create a new repository and commit a couple of changes.
Now that we’ve made a couple of commits let us look at what is happening under the hood.
$ cd .git
$ ls -l
drwxr-xr-x - user 25 Aug 15:51 branches
.rw-r--r-- 499 user 25 Aug 15:52 COMMIT_EDITMSG
.rw-r--r-- 92 user 25 Aug 15:51 config
.rw-r--r-- 73 user 25 Aug 15:51 description
.rw-r--r-- 21 user 25 Aug 15:51 HEAD
drwxr-xr-x - user 25 Aug 15:51 hooks
.rw-r--r-- 137 user 25 Aug 15:52 index
drwxr-xr-x - user 25 Aug 15:51 info
drwxr-xr-x - user 25 Aug 15:52 logs
drwxr-xr-x - user 25 Aug 15:52 objects
drwxr-xr-x - user 25 Aug 15:51 refs
Git stores everything under the .git folder in your repository. In fact, the .git directory is the Git repository.
Previously when you wrote the commit messages using your text editor, they
were in fact saved to COMMIT_EDITMSG
.
Each commit in Git is stored as an object. This object contains information about the author and the commit message. A commit object references a tree object that lists the files present in the directory at the time. Tree objects reference blob objects (that record the state of each file) or other tree objects.
Commits are referenced by a SHA-1 hash (a 40-character hexadecimal string).
Once you have several commits, each commit object also links to the hash of the previous commit(s) (there is more than one previous commit for for merge commits). The commits form a directed acyclic graph (do not worry if the term is not familiar).
All branches and tags in Git are pointers to commits.
Git is basically a content-addressed storage system
Content address is the content digest (SHA-1 checksum)
Stored data does not change - so when we modify commits, we always create new commits. Git doesn’t delete these right away, which is why it is very hard to lose data if you commit it once.
Let us poke a bit into raw objects! Start with:
$ git cat-file -p HEAD
Then explore the tree
object, then the file
object, etc. recursively using the hashes you see.
Demonstration: experimenting with branches
Let us lift the hood and create few branches manually. The goal of this exercise is to hopefully create an “Aha!” moment and provide us a good understanding of the underlying model.
We are starting from the main
branch and create an idea
branch:
$ git status
On branch main
nothing to commit, working tree clean
$ git switch --create idea
Switched to a new branch 'idea'
$ git branch
* idea
main
Now let us go in:
$ cd .git
$ cd refs/heads
$ ls -l
.rw-r--r-- 41 user 25 Aug 15:54 idea
.rw-r--r-- 41 user 25 Aug 15:52 main
Let us check what the idea
file looks like
(do not worry if the hash is different):
$ cat idea
045e3db14740c60684d745e5fb891ae71e335611
Now let us replicate this file:
$ cp idea idea-2
$ cp idea idea-3
$ cp idea idea-4
$ cp idea idea-5
Let us go up two levels and inspect the file HEAD
:
$ cd ../..
$ cat HEAD
ref: refs/heads/idea
Let us open this file and change it to:
ref: refs/heads/idea-3
Now we are ready for the aha moment! First let us go back to the working area:
$ cd ..
Now - on which branch are we?
$ git branch
idea
idea-2
* idea-3
idea-4
idea-5
main
Demonstration: If you add it, you don’t lose it (for a while)
A common way to (apparently) lose work
is to use git add
indiscriminately.
You make some changes to a file,
(let us call this version A)
you git add
them,
then you make some other changes
(let us call this version B)
and you git add
those again.
Now version A is apparently lost, and if we realize that we need it back we typically click nervously on the “undo” arrow of our editor.
But fear not! Try this.
Create a file named
test-add
with the following command:
echo 'Once a file has been git added, it is hard to lose!' > test-add
Add it to the repository
$ git add test-add
Now change the content of the file to be
Ops
And repeat the add command
$ git add test-add
Apparently we have lost the previous version of the file. But it is actually there, stored in a dangling blob object (which is not referenced by any other objects) We can see this with the command
fsck
:
$ git fsck
Checking object directories: 100% (256/256), done.
dangling blob dc3b15f60045eea7a87639436ed75021130579e0
We can see the content of that blob
by passing its hash (shortened for convenience)
to the git cat-file -p
command:
$ git cat-file -p dc3b
Once a file has been git added, it is hard to lose!
Discussion
Discuss the findings with other course participants.