Your Coding Agent's Other Problem: It's Reading the Wrong Code (code-review-graph, Part 2)
This is Part 2 of a two-part series on making coding agents actually reliable. Part 1 covered reasoning quality β Superpowers, Chain-of-Thought, and externalized planning. This one covers input quality.
There is a simpler version of the coding agent problem that gets less attention than the reasoning problem.
Claude Code, on every task, re-reads your entire codebase. Not the relevant files. Not the changed files. The whole thing. On a 500-file project, thatβs tens of thousands of tokens burned before the model has written a single line.
The result is predictable: the model drowns in irrelevant context. It hallucinates dependencies that donβt exist. It misses the actual dependency that does. Review quality degrades not because the model is thinking poorly, but because itβs reading poorly.
This is the input quality problem. And it has a clean solution.
What code-review-graph Does
code-review-graph, built by Tirth Patel, parses your repository into an Abstract Syntax Tree (AST) using Tree-sitter and stores it as a graph:
- Nodes: functions, classes, imports
- Edges: call sites, inheritance chains, test coverage relationships
When a file changes, the graph performs blast-radius analysis: trace every caller, every dependent, every test that could be affected by this change. The result is the minimal set of files Claude actually needs to read.
Instead of dumping 21,000 tokens of Next.js source code into the context window, Claude gets a 4,500-token structural summary of exactly what changed and what it touches.
The Numbers Are Striking
Benchmarked on three real production codebases across six real git commits:
| Repo | Files | Standard | With Graph | Reduction | Review Quality |
|---|---|---|---|---|---|
| httpx | 125 | 12,507 tokens | 458 tokens | 26.2x | 9.0 vs 7.0 |
| FastAPI | 2,915 | 5,495 tokens | 871 tokens | 8.1x | 8.5 vs 7.5 |
| Next.js | 27,732 | 21,614 tokens | 4,457 tokens | 6.0x | 9.0 vs 7.0 |
| Average | 13,205 | 1,928 | 6.8x | 8.8 vs 7.2 |
The token reduction is expected. The quality improvement is not β but it makes sense. A model given 458 precisely relevant tokens produces a better review than a model given 12,507 tokens of mixed signal. Less noise, higher signal. Precision beats volume.
Incremental updates run in under 2 seconds on a 2,900-file project. SHA-256 hash checks on every file mean only changed files get re-parsed. The graph stays current automatically on every save and commit.
The Mechanism: Why Blast Radius Works
The key insight is that code changes donβt affect code randomly. They propagate along dependency edges. If you change auth.py, anything that imports from auth.py might break. Any test that calls those functions needs to run. Nothing else does.
A flat file scan misses this structure entirely. It reads everything or guesses. The graph knows the structure and traces it precisely.
This is not a new idea in software engineering β itβs how incremental compilers work, how test impact analysis works, how monorepo build systems (Bazel, Buck, Nx) decide what to rebuild. code-review-graph applies the same principle to LLM context windows.
How It Connects to Superpowers
In Part 1, the three-layer stack for reliable coding agents looked like this:
soul.py β What do I know? (cross-session memory)
Superpowers β How should I plan and execute?
Reasoning model β How should I think?
code-review-graph adds the missing fourth layer:
βββββββββββββββββββββββββββββββββββββββββββββββ
β soul.py / MEMORY.md β β cross-session memory
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Superpowers β β planning and execution
βββββββββββββββββββββββββββββββββββββββββββββββ€
β code-review-graph β β precise context
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Reasoning model (o3, R1, Claude 3.7) β β per-token thinking
βββββββββββββββββββββββββββββββββββββββββββββββ
The layers address different failure modes:
- Reasoning model: shallow or impulsive thinking
- code-review-graph: reading the wrong code
- Superpowers: jumping to implementation without a plan
- soul.py: forgetting what it did last session
A coding agent with all four layers: thinks carefully, reads precisely, plans before acting, and remembers what it learned.
Thatβs the complete picture.
Installation
Claude Code (recommended):
claude plugin marketplace add tirth8205/code-review-graph
claude plugin install code-review-graph@code-review-graph
pip:
pip install code-review-graph
code-review-graph install
Then restart Claude Code and run:
Build the code review graph for this project
Initial build: ~10 seconds for a 500-file project. After that, fully automatic. 12 languages supported (Python, TypeScript, JavaScript, Go, Rust, Java, C#, Ruby, Kotlin, Swift, PHP, C/C++). MCP compatible.
code-review-graph β Tirth Patel
Part 1: Superpowers + CoT + Reasoning Models
soul.py β persistent memory for any LLM agent