Exploring Git Internals - Part 3: The .git folder

In the previous articles, we explored the git basics. Now lets look at how git works internally.
Before we do that though, lets look at the main git commands that are used by developers every day.
git init- initialize the repositorygit status- check the status of tracked and un-tracked filesgit add- add files to staging areagit commit- commit files added to the staging areagit push- push the changes to remote
The .git folder
The folder is created when user initializes the repository (or when user pulls the repo from remote)
git init or git clone repo_url

Everything related to Git about this repository is stored here. (Deleting this folder will remove all tracking and git history from the repository). This includes the following:
commits
branches
tags
the staging area (index)
your local config
remote tracking info
If we take a look inside the .git folder, we will find some folders like this.

Some of the things in the folder:
COMMIT_EDITMSG→ file containing commit messagesconfig→ stores the git configurationHEAD→ points to the branch we are currently onref: refs/heads/masterindex→ staging arealogs/→ log history (“where HEAD has been”)objects/→ where Git stores actual data (files + commits)refs/→ branch pointers (likemaster)
Git Branches
Branches are literally just files containing commit hashes. A branch is a label that points to a commit. When we commit, Git moves the label forward.

Git objects
Git stores everything as objects. There are only 4 types:
Blob → A blob stores the contents of a file, not its name. (Two files with same content will have only one blob)
Tree → Stores file names, permissions, reference to blobs, pointers to other trees
Commit → It is a snapshot with metadata. It points to a tree, parent commits, author, timestamp, message
Tag → Tags point to commits (usually), often used for releases.
Git Commands
Git add
When user does git add, git creates a blob object from file contents and updates the index file with a pointer to the blob. Index acts like a draft snapshot of the next commit.
Git commit
When user runs git commit, git takes the index and writes a tree object adding the snapshot to the tree structure. It also creates a commit object that points to the tree. In addition it keeps the author and commit message data. In the end, the reference to the branch to the latest commit.
Commits are actually stored in .git/objects/ as hashes (not as files themselves). This helps save a lot of space and keeps git fast. This also avoids dumping millions of objects into one directory. Every Git object is identified by a hash (see image). That hash is computed from the content of the object itself. The first 2 characters of the hash are converted into a folder and the remaining characters become the name of the hash. This is done primarily to make searching the hash object faster. Each hash is 40 characters long. Making the folder structure this way reduces the search space by a factor of 256 (16×16 for 2 Hex characters).
Another advantage of using hashes is that if data changes, the hash also changes. i.e. any corruption in the hash is detectable and tracked.

Git Checkout
This command is used to switch branch. When switching to the branch git just points the head to refer the pointer to the branch. Git will update the index(staging) to match the commit’s snapshot. The working directory files are also updated to match the snapshot.
Miscellaneous
Modern repositories can have millions of objects, huge histories and huge teams. Git should be able to work fast and reliably even in such conditions. Git uses the following strategies to make this happen:
De-duplicating objects: Files containing same content are stored as same blob. This avoids duplication.
Packed Storage (packfiles): Git periodically compresses objects into packfiles. This helps in reducing space, speeds up push/pull and also makes searches efficient.
Delta compression in packs: When required, git also stores some objects as diff from similar objects to reduce the size.
Smart indexing and fan-out directories
Conclusion
Git feels hard when it’s treated like a collection of commands you must memorize. But once you understand what’s happening inside .git/, everything becomes predictable.

