Git Basics — JAGADEESWARA REDDY P

Snapshots, not diffs

Most people think git stores diffs between versions. It doesn’t. Every commit stores a complete snapshot of every tracked file. Diffs are computed on the fly when you ask for them. This is why checkout is fast and git can compare any two commits without replaying history.

A commit object contains four things:

A pointer to a tree object (the snapshot — a recursive listing of every file and its content hash)
The SHA(s) of its parent commit(s) (one for normal commits, two for merges, zero for the initial commit)
Author and committer with timestamps
The commit message

The tree object doesn’t store diffs either. It maps filenames to blob SHAs. If a file hasn’t changed between commits, both trees point to the same blob — no duplication. Git achieves storage efficiency through content-addressable deduplication and later through packfiles, not by storing deltas as a primary representation.

When you run git diff, git reads two trees, compares blob SHAs, and computes the diff on the spot. When you run git log -p, it does the same thing for every pair of adjacent commits. The diff is always derived, never stored as the source of truth.

The three trees

Git moves content between three data structures. Every command makes sense once you understand which trees it reads from and writes to.

Working Directory — the actual files on disk. What you see in your editor.
Index (Staging Area) — a flat file (.git/index) listing every tracked file with its blob SHA and metadata. This is what the next commit will contain.
Repository (HEAD) — the commit graph. HEAD points to the current branch, which points to the latest commit, which points to a tree.

The flow:

git add copies from working dir to index (creates blob objects, updates .git/index)
git commit writes the index as a tree object, creates a commit object pointing to it, advances the branch pointer
git checkout <file> copies from repository to index to working dir
git reset moves HEAD and optionally resets the index and working dir

What each reset mode does

+-------------------+-------+-----------+-------------+
| Mode              | HEAD  | Index     | Working Dir |
+-------------------+-------+-----------+-------------+
| --soft            | Moved | Unchanged | Unchanged   |
+-------------------+-------+-----------+-------------+
| --mixed (default) | Moved | Reset     | Unchanged   |
+-------------------+-------+-----------+-------------+
| --hard            | Moved | Reset     | Reset       |
+-------------------+-------+-----------+-------------+

--soft is useful when you want to re-commit with a different message or combine commits. --mixed is the default — it unstages everything but leaves your files intact. --hard is the nuclear option.

Branches are pointers

A branch is a 41-byte file containing a commit SHA. That’s it. Creating a branch is writing a file. Switching branches is changing which file HEAD points to.

cat .git/refs/heads/main
# e4a1f2c... (the SHA of the latest commit on main)

cat .git/HEAD
# ref: refs/heads/main

HEAD is a symbolic ref — it points to a branch name, not directly to a commit. When you commit, git advances the branch pointer (the file in refs/heads/) to the new commit’s SHA. Other branches don’t move. That’s why committing on feature doesn’t change where main points.

Detached HEAD

Detached HEAD means HEAD points directly to a commit SHA instead of a branch name. This happens when you checkout a tag, a specific commit, or a remote tracking branch.

git checkout abc1234
# HEAD is now at abc1234...

cat .git/HEAD
# abc1234...  (raw SHA, not a ref)

Commits made in detached HEAD have no branch tracking them. Once you switch away, the only reference to those commits is the reflog. After the reflog expires (~90 days), they become unreachable and git’s garbage collector removes them. If you do work in detached HEAD, create a branch before switching away: git switch -c my-branch.

Essential commands decoded

What each command actually does under the hood:

git add — hashes file contents, creates blob objects in .git/objects, updates the index with the new blob SHAs.
git commit — writes the index as a tree object, creates a commit object with the tree SHA + parent SHA + metadata, updates the branch ref file to the new commit SHA.
git diff — compares working dir blobs against index blobs. git diff --staged compares index blobs against HEAD tree blobs.
git stash — creates two (or three) hidden commits: one for the index state, one for the working dir state, optionally one for untracked files. Resets index and working dir to HEAD. The stash ref points to the working dir commit, which has the index commit and HEAD as parents.
git checkout <branch> / git switch — moves HEAD to point to the new branch, updates the index to match the branch’s latest tree, updates the working dir to match the index. Fails if uncommitted changes would be overwritten.
git cherry-pick <sha> — computes the diff the target commit introduced (target vs its parent), applies that diff to the current HEAD, creates a new commit. The new commit has a different SHA because its parent is different.
git revert <sha> — computes the inverse of the target commit’s diff and applies it as a new commit. Unlike reset, revert is safe for shared branches because it adds history rather than removing it.

Merge vs rebase

Merge

Creates a new commit with two parents — the tip of the current branch and the tip of the branch being merged. The merge commit’s tree is the result of combining both sets of changes. Git preserves the full branch topology in history.

Fast-forward: when the current branch has no commits that diverge from the target, git doesn’t create a merge commit. It just moves the branch pointer forward to the target commit. The history looks linear even though a branch existed.

--no-ff forces a merge commit even when fast-forward is possible. This is useful for preserving feature branch context — you can see in the graph where a feature started and where it was integrated.

# Fast-forward (no merge commit):
git merge feature
# main moves to where feature points

# Force merge commit:
git merge --no-ff feature
# creates a merge commit even if fast-forward was possible

Rebase

Replays your commits onto a new base. For each commit in your branch that isn’t in the target, git computes the diff, applies it on top of the target, and creates a new commit. Every replayed commit gets a new SHA because its parent changed. The result is linear history.

git checkout feature
git rebase main
# feature's commits are replayed on top of main's tip

The golden rule: never rebase commits that exist on a shared/public branch. Rewriting published history forces everyone else to reconcile divergent SHAs. Their local branch and the remote will have different commit objects with the same content, and the resulting merge conflicts are confusing and unnecessary.

Use rebase for local feature branch cleanup before merging. Use merge to integrate into main.

Undoing things

Different levels of destruction, from gentle to nuclear:

git reset --soft HEAD~1 — move HEAD back one commit. Changes remain staged. Useful for re-wording a commit message or combining the last two commits.
git reset --mixed HEAD~1 — move HEAD back, unstage changes. Files are still modified in the working dir. This is reset’s default.
git reset --hard HEAD~1 — move HEAD back, reset index, reset working dir. The commit and all changes are gone (from the branch’s perspective).
git checkout -- <file> / git restore <file> — discard working dir changes for a single file, replacing it with the index version.
git revert HEAD — create a new commit that is the exact inverse of the last one. Safe for shared branches because it adds to history.

The reflog safety net

Git logs every HEAD movement for approximately 90 days. git reflog shows the history — every commit, reset, rebase, merge, and checkout that moved HEAD.

Even after reset --hard or a bad rebase, the old commit SHAs still exist in the object store. They’re just unreachable from any branch. The reflog keeps them discoverable:

git reflog
# e4a1f2c HEAD@{0}: reset: moving to HEAD~3
# 7b2c3d4 HEAD@{1}: commit: add payment processing
# a1b2c3d HEAD@{2}: commit: fix validation bug
# ...

# recover by resetting to the old SHA:
git reset --hard 7b2c3d4

Remote workflow

git fetch — downloads objects and refs from the remote. Updates tracking branches (origin/main, origin/feature). Never touches the working dir or index. This is always safe to run.
git pull = fetch + merge (or fetch + rebase with --rebase). It modifies your working dir and can create merge commits.
Tracking branches (origin/main) are local read-only snapshots of remote state. They are updated only on fetch or pull. Between fetches, they go stale — origin/main shows where the remote was last time you checked, not where it is now.
git push uploads local commits and updates the remote ref. Rejected if the remote has diverged (the remote’s branch has commits you don’t have locally). Solve with fetch + merge, then push again. Do not reach for --force.

Rescue patterns

Common “oh no” scenarios and their fixes.

Committed to the wrong branch

# Undo the commit but keep the changes staged:
git reset --soft HEAD~1

# Stash the changes:
git stash

# Switch to the correct branch:
git switch correct-branch

# Apply and commit:
git stash pop
git commit

Need to split a commit

# Undo the commit, keep changes in working dir:
git reset HEAD~1

# Stage selectively:
git add -p       # interactive hunk staging
git commit -m "part 1"

git add -p
git commit -m "part 2"

Undo the last push

# Safe — creates a new reverting commit:
git revert HEAD
git push

# Nuclear — rewrites remote history (personal branches only):
git reset --hard HEAD~1
git push --force-with-lease

“Git is not a series of commands to memorize. It is three trees, a DAG of snapshots, and pointer arithmetic. Once you see that, every command is obvious.”

For what happens under the hood — how objects are stored, how merge actually works, and how to build a commit from plumbing commands — see Git Internals.