Git under the hood
Objectives
Verify that branches are pointers to commits and extremely lightweight.
Instructor note
10 min teaching/type-along
15 min exercise
Down the rabbit hole
When working with Git, you will never need to go inside .git, but in this exercise we will, in order to learn about how branches are implemented in Git.
For this exercise create a new repository and commit a couple of changes.
Now that we’ve made a couple of commits let us look at what is happening under the hood.
$ cd .git
$ ls -l
drwxr-xr-x - user 25 Aug 15:51 branches
.rw-r--r-- 499 user 25 Aug 15:52 COMMIT_EDITMSG
.rw-r--r-- 92 user 25 Aug 15:51 config
.rw-r--r-- 73 user 25 Aug 15:51 description
.rw-r--r-- 21 user 25 Aug 15:51 HEAD
drwxr-xr-x - user 25 Aug 15:51 hooks
.rw-r--r-- 137 user 25 Aug 15:52 index
drwxr-xr-x - user 25 Aug 15:51 info
drwxr-xr-x - user 25 Aug 15:52 logs
drwxr-xr-x - user 25 Aug 15:52 objects
drwxr-xr-x - user 25 Aug 15:51 refs
Git stores everything under the .git folder in your repository. In fact, the .git directory is the Git repository.
Previously when you wrote the commit messages using your text editor, they
were in fact saved to COMMIT_EDITMSG
.
Each commit in Git is stored as a “blob”. This blob contains information about the author and the commit message. The blob references another blob that lists the files present in the directory at the time and references blobs that record the state of each file.
Commits are referenced by a SHA-1 hash (a 40-character hexadecimal string).
Once you have several commits, each commit blob also links to the hash of the previous commit. The commits form a directed acyclic graph (do not worry if the term is not familiar).
All branches and tags in Git are pointers to commits.
Git is basically a content-addressed storage system
Content address is the content digest (SHA-1 checksum)
Stored data does not change - so when we modify commits, we always create new commits. Git doesn’t delete these right away, which is why it is very hard to lose data if you commit it once.
Let us poke a bit into raw objects! Start with:
$ git cat-file -p HEAD
Then explore the tree
object, then the file
object, etc. recursively using the hashes you see.
Demonstration: experimenting with branches
Let us lift the hood and create few branches manually. The goal of this exercise is to hopefully create an “Aha!” moment and provide us a good understanding of the underlying model.
We are starting from the main
branch and create an idea
branch:
$ git status
On branch main
nothing to commit, working tree clean
$ git switch --create idea
Switched to a new branch 'idea'
$ git branch
* idea
main
Now let us go in:
$ cd .git
$ cd refs/heads
$ ls -l
.rw-r--r-- 41 user 25 Aug 15:54 idea
.rw-r--r-- 41 user 25 Aug 15:52 main
Let us check what the idea
file looks like
(do not worry if the hash is different):
$ cat idea
045e3db14740c60684d745e5fb891ae71e335611
Now let us replicate this file:
$ cp idea idea-2
$ cp idea idea-3
$ cp idea idea-4
$ cp idea idea-5
Let us go up two levels and inspect the file HEAD
:
$ cd ../..
$ cat HEAD
ref: refs/heads/idea
Let us open this file and change it to:
ref: refs/heads/idea-3
Now we are ready for the aha moment! First let us go back to the working area:
$ cd ..
Now - on which branch are we?
$ git branch
idea
idea-2
* idea-3
idea-4
idea-5
main
Discussion
Discuss the findings with other course participants.