<< BACK

Git is a Database

A content-addressable object store with immutable snapshots, append-only writes, and no query language. Git is a database — just not the one you'd choose.

DATE:
APR.27.2026
READ:
10 MIN

Git is a database

Here is a statement that will make database engineers uncomfortable and Git users defensive: Git is a database.

Not metaphorically. Not “sort of like a database if you squint.” It is, by any reasonable definition, a database.

Git has a storage engine — a content-addressable object store that maps SHA hashes to zlib-compressed blobs. It has indexes — the staging area (literally called “the index”) and pack index files for fast lookups. It has transactions — a commit is an atomic operation that either succeeds completely or doesn’t happen. It has replication — push, pull, and clone distribute data across nodes. It has garbage collection — git gc compacts loose objects and prunes unreachable data. It has a write-ahead log — the reflog records every ref mutation for crash recovery.

That is a database. The only thing missing is a query language.

SQLite, meanwhile, is the database nobody argues about. B-tree storage engine. SQL query language. ACID transactions. WAL mode for concurrent readers. Single-file deployment. Runs on literally billions of devices — every iPhone, every Android phone, every Mac, every major browser.

Both fit in a single directory. Both are local-first. Both are embeddable. Both are used by more people than any distributed database you admire. The question is not whether they are databases. The question is what tradeoffs each chose, and why those tradeoffs produce such radically different tools.

The tradeoff that defines everything

Git chose: history is free, queries are expensive.

Every state of your project is preserved forever. Want to see what your codebase looked like six months ago? One command. Want to find which commit introduced a bug? git bisect binary-searches through history automatically. Want to know who wrote this line and when? git blame. Want to see the exact diff between any two points in time? git diff. History is not an afterthought in Git — it is the entire point. The object model is designed around it.

But try answering a simple question: “How many files in this repo are larger than 1MB?”

++
git ls-files -z | xargs -0 -I {} sh -c 'test $(wc -c < "{}") -gt 1048576 && echo {}'
++

That is not a query. That is suffering. “Which modules import this function?” — Git has no answer. “Show me all files that changed more than 10 times in the last month” — pipe git log through awk and pray. Git stores everything but lets you ask nothing.

SQLite chose the opposite: queries are free, history is expensive.

Want all users with more than 100 orders joined with their latest shipping address? One SQL statement. Want a count grouped by region, filtered by date range, sorted by revenue? Another SQL statement. The query language is the product.

++
SELECT u.name, COUNT(o.id) AS order_count, a.city
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN addresses a ON a.id = u.default_address_id
WHERE o.created_at > date('now', '-90 days')
GROUP BY u.id
HAVING order_count > 100
ORDER BY order_count DESC;
++

Now try: “What did the users table look like last Tuesday?” You cannot. Unless you built an audit table, wrote triggers, implemented temporal columns — all manual, all your problem, all fragile. SQLite stores the present and forgets the past.

This is the fundamental split. Everything else follows from it.

Where Git wins

Branching is free. Creating a branch in Git is writing 41 bytes to a file. That is not an exaggeration — a branch is a file containing a 40-character SHA plus a newline. Try “branching” an entire SQLite database. You would copy the whole file. Yes, you could use savepoints, but that is not the same thing. You cannot have two people working on different branches of a SQLite database simultaneously.

++
# Create a branch. This is the entire operation.
echo "a1b2c3d4..." > .git/refs/heads/my-feature
++

Merging is a solved problem. Git’s three-way merge algorithm can reconcile two divergent histories automatically. Two developers modify different files? Auto-merged. Same file, different sections? Auto-merged. Same line? Conflict — but an explicit, well-defined conflict with clear markers. Database merge? You are writing custom conflict resolution logic by hand, and you are probably getting it wrong.

Distribution is native. Every clone is a full replica. There is no primary/secondary setup. No replication lag. No split brain. No failover configuration. Your laptop dies? Clone from any other copy. GitHub goes down? Your local repo has the entire history. (Merge conflicts are Git’s version of split brain, but at least they are explicit and local.)

Immutability means trust. A commit SHA is a cryptographic hash of its contents — the tree, the parent commits, the author, the timestamp, the message. If the SHA matches, the content has not been tampered with. You can verify the integrity of an entire repository by checking a single hash. SQLite rows have no such guarantee. An UPDATE is silent, permanent, and untraceable unless you built auditing yourself.

Where SQLite wins

Structured queries. “Find all commits that touched files in src/auth/ between March and April” is a nightmare in Git. You would chain git log --since --until -- src/auth/ and parse the output. In a database with the right schema, it is a WHERE clause. The difference is not convenience — it is composability. SQL queries compose. Shell pipelines break.

++
-- This is what Git cannot do
SELECT commit_hash, author, message
FROM commits
JOIN file_changes ON file_changes.commit_id = commits.id
WHERE file_changes.path LIKE 'src/auth/%'
  AND commits.date BETWEEN '2026-03-01' AND '2026-04-01';
++

Mutations are natural. In Git, you cannot update a file “in place.” You create a new blob, a new tree, a new commit — an entirely new snapshot of the universe because you changed one line. That is elegant for source code where you want every previous state preserved. It is insane for a shopping cart. SQLite’s UPDATE is what you actually want for mutable state.

Relationships. Git has no concept of foreign keys, joins, or referential integrity. Trees point to blobs, commits point to trees — but that is a fixed schema designed for one purpose. Your data model does not fit into blob/tree/commit? Too bad. SQLite lets you define whatever schema your domain requires.

Concurrency at scale. SQLite handles thousands of concurrent readers with WAL mode. A single writer acquires a lock, writes, releases — readers are never blocked. Git handles concurrent writes by making you resolve merge conflicts manually. At scale, that is not a concurrency strategy. It is a prayer.

The unholy hybrids

This is where it gets interesting. Some projects looked at the Git-vs-SQLite divide and refused to choose.

Fossil — created by D. Richard Hipp, the same person who created SQLite. It is a version control system where the repository IS a SQLite database. Commits, branches, tickets, wiki pages — all stored in SQL tables, all queryable. The person who built the best embedded database in the world looked at Git and said “we can do better.” That should tell you something.

Dolt — a SQL database with Git semantics. You can branch a database. Diff two schemas. Merge table changes. dolt diff main feature shows you row-level changes between branches. It is unhinged and it works. Want to know what a table looked like before your coworker’s migration? SELECT * FROM users AS OF 'main~3'. That is time-travel queries on a SQL database with branch syntax.

++
# Branch a database like you branch code
dolt checkout -b experiment
dolt sql -q "ALTER TABLE users ADD COLUMN score INT"
dolt add .
dolt commit -m "add score column"
dolt diff main experiment
++

DVC (Data Version Control) — Git for datasets. It stores metadata and pointers in Git, actual data in S3 or GCS. Someone realized that Git’s object model is perfect for tracking versions but terrible for storing 50GB CSV files. So they kept the version graph and externalized the storage. Pragmatic and slightly horrifying.

Git-backed CMS — TinaCMS, Netlify CMS, and others use Git as the content database. Every edit is a commit. Every publish is a merge to main. Branching gives you draft content for free. It works surprisingly well until you need to query across 10,000 posts, at which point you discover that Git’s lack of a query language is not a theoretical problem.

When to use which

Cut the philosophy. Here are the practical rules.

+--------------------+--------------------+
| You need           | Use                |
+--------------------+--------------------+
| Version history of | Git                |
| text files         |                    |
+--------------------+--------------------+
| Structured queries | SQLite             |
| on data            |                    |
+--------------------+--------------------+
| Branching/merging  | Git                |
| workflows          |                    |
+--------------------+--------------------+
| Mutable state      | SQLite             |
| (user data,        |                    |
| config)            |                    |
+--------------------+--------------------+
| Distributed        | Git                |
| collaboration      |                    |
+--------------------+--------------------+
| Embedded app       | SQLite             |
| storage            |                    |
+--------------------+--------------------+
| Audit trail        | Git                |
| without extra work |                    |
+--------------------+--------------------+
| Audit trail with   | SQLite + triggers  |
| queryable data     |                    |
+--------------------+--------------------+
| Both               | Fossil, Dolt, or   |
| simultaneously     | accept your fate   |
+--------------------+--------------------+

Most real systems need both. Your application stores user data in SQLite (or Postgres, or whatever). Your source code lives in Git. You already use two databases daily — you just only think of one as a database.

The lesson

Understanding Git as a database makes you better at both Git and databases.

Git teaches you that immutability and content addressing are powerful primitives. When every object is identified by the hash of its contents, you get integrity verification, deduplication, and cacheability for free. Most databases could learn from this.

SQLite teaches you that queryability is not optional for real-world data. Storing everything is pointless if you cannot ask questions about it. Git stores a perfect history and gives you grep to explore it. That is a tragedy.

The best systems will increasingly blur this line. Dolt already ships Git semantics inside a SQL database. Fossil already ships SQL queries inside a VCS. Expect more hybrids — more tools that treat version history and structured queries as complementary features rather than opposing philosophies.

The split was never necessary. It was an accident of history. And the tools that fix it will be the ones that win.

Every developer uses two databases every day — Git and whatever their application talks to. They spend years mastering SQL and weeks memorizing Git commands. The irony is that understanding Git as a database is the fastest way to stop being afraid of it.