Inside the .git Folder: A Deep Dive

In the previous articles, we explored the git basics. Now lets look at how git works internally.

Before we do that though, lets look at the main git commands that are used by developers every day.

git init - initialize the repository
git status - check the status of tracked and un-tracked files
git add - add files to staging area
git commit - commit files added to the staging area
git push - push the changes to remote

The .git folder

The folder is created when user initializes the repository (or when user pulls the repo from remote)

git init or git clone repo_url

Everything related to Git about this repository is stored here. (Deleting this folder will remove all tracking and git history from the repository). This includes the following:

commits
branches
tags
the staging area (index)
your local config
remote tracking info

If we take a look inside the .git folder, we will find some folders like this.

Some of the things in the folder:

COMMIT_EDITMSG→ file containing commit messages
config → stores the git configuration
HEAD → points to the branch we are currently on ref: refs/heads/master
index → staging area
logs/ → log history (“where HEAD has been”)
objects/ → where Git stores actual data (files + commits)
refs/ → branch pointers (like master)

Git Branches

Branches are literally just files containing commit hashes. A branch is a label that points to a commit. When we commit, Git moves the label forward.

Git objects

Git stores everything as objects. There are only 4 types:

Blob → A blob stores the contents of a file, not its name. (Two files with same content will have only one blob)

Tree → Stores file names, permissions, reference to blobs, pointers to other trees

Commit → It is a snapshot with metadata. It points to a tree, parent commits, author, timestamp, message

Tag → Tags point to commits (usually), often used for releases.

Git Commands

Git add

When user does git add, git creates a blob object from file contents and updates the index file with a pointer to the blob. Index acts like a draft snapshot of the next commit.

Git commit

When user runs git commit, git takes the index and writes a tree object adding the snapshot to the tree structure. It also creates a commit object that points to the tree. In addition it keeps the author and commit message data. In the end, the reference to the branch to the latest commit.

Commits are actually stored in .git/objects/ as hashes (not as files themselves). This helps save a lot of space and keeps git fast. This also avoids dumping millions of objects into one directory. Every Git object is identified by a hash (see image). That hash is computed from the content of the object itself. The first 2 characters of the hash are converted into a folder and the remaining characters become the name of the hash. This is done primarily to make searching the hash object faster. Each hash is 40 characters long. Making the folder structure this way reduces the search space by a factor of 256 (16×16 for 2 Hex characters).

Another advantage of using hashes is that if data changes, the hash also changes. i.e. any corruption in the hash is detectable and tracked.

Git Checkout

This command is used to switch branch. When switching to the branch git just points the head to refer the pointer to the branch. Git will update the index(staging) to match the commit’s snapshot. The working directory files are also updated to match the snapshot.

Miscellaneous

Modern repositories can have millions of objects, huge histories and huge teams. Git should be able to work fast and reliably even in such conditions. Git uses the following strategies to make this happen:

De-duplicating objects: Files containing same content are stored as same blob. This avoids duplication.
Packed Storage (packfiles): Git periodically compresses objects into packfiles. This helps in reducing space, speeds up push/pull and also makes searches efficient.
Delta compression in packs: When required, git also stores some objects as diff from similar objects to reduce the size.
Smart indexing and fan-out directories

Conclusion

Git feels hard when it’s treated like a collection of commands you must memorize. But once you understand what’s happening inside .git/, everything becomes predictable.

Exploring Git Internals - Part 3: The .git folder

The .git folder

Git Branches

Git objects