Skip to content
← All writing·Architecture·June 14, 2026·3 min

SourceVault.ai

Why your AI assistant can't tell you why your code changed

Ask an AI coding assistant where a function is defined and it will find it. Ask it what a regex does and it will explain it. Then ask the question you actually have at 2 a.m. during an incident — why is this line here, and when did this behavior change? — and it goes quiet, or worse, it guesses.

Assistants index the snapshot, not the story

Cloud coding assistants build their index from the current state of your repository: they chunk the files on the default branch, embed those chunks, and retrieve them at query time. That's genuinely useful for "where" and "what" questions, because the answer is sitting in the code as it exists right now.

But "why" and "when" don't live in the current code. They live in the history — the commit that introduced the line, the message explaining the tradeoff, the pull request that referenced the incident, the five-year trail of upgrades that led to the version pinned today. None of that is in a snapshot index, because a snapshot, by definition, has thrown the history away.

You could imagine an assistant ingesting your full git history to close the gap. Almost none do — and there's a good reason beyond engineering effort. Your history is a larger and more revealing artifact than your current code. It contains the security fix and the commit message that describes the vulnerability it patched. It contains who touched what, when, and why. Shipping all of that to a third party to make it searchable is a much bigger ask than shipping the current tree — and for a lot of teams, it's a non-starter.

What it looks like when the history is queryable

The fix is to treat commit history as a first-class retrieval layer, indexed alongside the code but kept distinct from it. To make this concrete, here's a real question answered against the public Express repository — six thousand commits of genuine history, indexed locally:

Q: When was serve-static last changed, and why?

The last change to serve-static was on 2024-09-10, in commit 4c9ddc1 — "upgrade to serve-static@0.16.0." The trail goes back further: 620df0e moved it to 2.0.0-beta.1 in 2022, and 88f9733 took it to 1.14.1 in 2019.

Every claim there carries a commit SHA you can click straight through to the diff. The answer didn't come from a model's training data or a guess about a common dependency — it came from retrieving the actual commits and grounding the answer in them. That's the difference between an assistant that pattern-matches and one that reads your history.

Mechanically it's not exotic: commit messages and their changed-file lists are embedded into their own index, separate from the code chunks so they don't muddy ordinary code search. A question that reads like "why did X change" or "when did Y break" pulls the most relevant commits into context, and the answer cites them by hash. The same retrieval discipline that grounds a code answer in a file and line range grounds a history answer in a commit.

The part that matters for regulated teams

Here's the turn. Once you accept that history is the missing layer, you also have to accept what history is: the most sensitive narrative your repository holds. The current code shows what your system does. The history shows how you got there — the incidents, the reasoning, the names, the things you fixed quietly. If "we won't send your code to the cloud" matters to your security team, "we won't send your history" should matter more.

Which is why the only safe way to make history queryable is to never move it. Index it where it already lives, on infrastructure you control, and let the answers — commit SHAs and all — be generated locally. Not a policy promise that the data won't be retained; an architecture in which it never leaves. That distinction is verifiable: you can watch the network and confirm nothing egresses.

"Why is this here?" is one of the most common questions a developer asks, and one of the few an AI assistant still can't answer. The reason isn't the model. It's that answering it well means reading the one part of your repository you'd least want to upload — so the answer has to come to the history, not the other way around.

See it grounded

SourceVault indexes your code and its history locally, and answers with file-and-line and commit-level citations — measured at 100% grounded on a public benchmark, with zero source code leaving your infrastructure.

More writing

Adjacent essays

DevOps·Jun 1, 2026·6 min

Debugging a Broken Python Virtual env: From “It Worked Yesterday” to Clean Recovery

Recently one of my Python projects went from “works fine” to “completely broken”.

LuCodesRead →
Architecture·May 21, 2026·5 min

Wiring Environment‑Aware Mobile Builds with Expo, EAS, and GitHub Actions

This post shows how I turned a stock Expo app into an environment‑aware mobile project by wiring APP_ENV through EAS build profiles, app.config.ts, a runtime apiClient, and GitHub Actions for linting and manual builds.

LuCodesRead →
May 12, 2026·2 min

JSON Schema Validation for Working Engineers (2026 Update)

JSON Schema enables the confident and reliable use of the JSON data format.

LuCodesRead →