We developers use git all the time. Git internals might feel like magic but what git actually does is really simple. Let us peek under the hood to see how git works.

Lets create a empty folder and then run initialize the git repo by running

$ git init

This creates .git folder inside your empty folder. The structure of this .git folder is as follows

.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

If you open up the HEAD file in your text editor you will see the following text in it

 ref: refs/heads/master

which means that your current branch is master.

Now add a file and make a first commit by doing

 $ echo "Hello" >> README.md
 $ git add .
 $ git commit -m "Initial commit"

now run $ git log and you will get

commit acde617e8ab39bb157821d3bf84d04e157bff52c (HEAD -> master)
Author: username <test@email.com>
Date:   Wed Aug 05 18:43:48 2020 +0330

    Initial commit

Note:

  • The exact commit hash that you get will differ from what you see here depending on your username, email and time that you make the commit

And if you open up refs/head/master it will have text acde617e8ab39bb157821d3bf84d04e157bff52c inside it.

In git each commit is associated with a hash. The content of the file refs/head/master means master is pointing to the commit acde617e8ab39bb157821d3bf84d04e157bff52c

 // TODO:

 1. Now make a second commit
 2. check `$ git log` and content of **refs/head/master**

After you have made your first commit if you inspect the contents of the .git folder again you will see something new

.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           └── master
├── objects
│   ├── ac
│   │   └── de617e8ab39bb157821d3bf84d04e157bff52c
│   ├── dc
│   │   └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│   ├── e9
│   │   └── 65047ad7c57865823c7d992b1d046ea66edf78
│   ├── info
│   └── pack
└── refs
    ├── heads
    │   └── master
    └── tags

There is a new file called index and there some weird things inside object folders (we will ignore all other new things for now). When you run $ git add . git takes the changes that you have made and creates objects for it. The names of the objects are determined by running your file content into SHA1 algorithm. SHA1 algorithm basically takes some input and outputs 40 character string.

Lets try to generate SHA1 of the file README.md. You can do that by running

$ git hash-object README.md

which will give you output

e965047ad7c57865823c7d992b1d046ea66edf78

So that is where the content of the file README.md is stored. The first two characters of the hash are used for folder name. The file 65047ad7c57865823c7d992b1d046ea66edf78 is binary file to see its content we can run

$ git cat-file -p e965047ad7c57865823c7d992b1d046ea66edf78

which outputs

Hello

Which is the content of your README.md !!!

But what are other two objects?

There two other objects that are present in the objects directory. What are those? Git has four types of objects blob, tree, commit and tag. Blob is used to store the content of the file the one we just saw is a blob. You can see the type of the object by running

$ git cat-file -t e965047ad7c57865823c7d992b1d046ea66edf78

which will print

blob

When you run

$ git cat-file -p acde617e8ab39bb157821d3bf84d04e157bff52c

and you will get

tree dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
author username <test@email.com> 1597845162 +0530
committer username <test@email.com> 1597845162 +0530

Initial commit

That is our actual commit and it has author, committer and something called tree which is another git object

Lets see what that tree object has by running

$ git cat-file -p dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f

It outputs

100644 blob e965047ad7c57865823c7d992b1d046ea66edf78	README.md

It has the name of the file and name of the blob that has the file content. It is essentially how your working directory looked like at that commit.

Lets sum up our understanding till now. When you make a commit in git. The content of the file are passed through an SHA1 hash to get a 40 character length string which is used to store the content of the file. Then it creates a tree object which is essentially how your working directory looked at that point in time. The tree says which blobs are associated with which file names. Then there is a commit object which points to this tree object and also has the commit message, author, committer, and email.

 // TODO

 1. Now make another commit
 2. inspect the contents of the .git folder
 3. See what are the objects that are there in .git folder
 4. Look at content of objects

Branching

Now lets create a branch by running

$ git branch feature-1

Now lets take a look at content of .git folder

.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           ├── feature-1
│           └── master
├── objects
│   ├── ac
│   │   └── de617e8ab39bb157821d3bf84d04e157bff52c
│   ├── dc
│   │   └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│   ├── e9
│   │   └── 65047ad7c57865823c7d992b1d046ea66edf78
│   ├── info
│   └── pack
└── refs
    ├── heads
    │   ├── feature-1
    │   └── master
    └── tags

Now there is a new file called refs/heads/feature-1 and if we take a peak at its content it will be the commit hash from which you created the branch.

Now if we checkout feature-1 branch by running

$ git checkout feature-1

The content of our HEAD file changes to

ref: refs/heads/feature-1
 // TODO

 1. Try creating a file refs/heads/feature-2
 2. Run git log
 3. Put the hash inside that file
 4. Try running git branch

Staging area

When you create a file you are creating the file in your local file system and after you are done you add the file to git by running $ git add . this adds the file to the staging area. Then when you make commit the files the commit object is created for files in the staging area.

So the question now is where is this staging area? The answer is it is in the index file

We can see the contents of the staging area by running

$ git ls-files --stage

which gives us

100644 e965047ad7c57865823c7d992b1d046ea66edf78 0 README.md
 // TODO

 1. Create a README2.md file
 2. Run git ls-files --stage and look at its content
 3. Run git add .
 4. Now run git ls-files --stage again

Note:

  1. We have skipped some details like tags, packing…
  2. The number 100644 is essentially permissions of the file

Thanks to Yancy Min for sharing their work on Unsplash

This post is also available on DEV.