Graphify: One Command Turns Any Folder Into a Knowledge Graph

By Prahlad Menon 4 min read

Andrej Karpathy keeps a folder called /raw. Papers, tweets, screenshots, handwritten notes, whiteboard photos — everything goes in. His ask, posted earlier this week: what’s the tool that lets me query across all of it without reading every file?

Forty-eight hours later, Graphify appeared. The README opens with: “graphify is the answer to that problem.”

It currently sits at 7,311 GitHub stars and 762 forks, four days after its first commit. That’s not a slow burn — that’s a direct hit.

What Graphify Actually Does

Type /graphify . inside Claude Code, Codex, or OpenClaw. That’s it. Graphify reads your folder — code, PDFs, markdown, screenshots — builds a knowledge graph, and writes three outputs:

  • graphify-out/graph.html — interactive, explorable graph
  • GRAPH_REPORT.md — plain English audit: god nodes, communities, surprising connections
  • graph.json — persistent, queryable by your AI assistant

The headline number: 71.5x fewer tokens per query compared to feeding raw files. That’s not a benchmark artifact. The graph stores relationships, not content. Your LLM traverses structure instead of reading thousands of lines. GRAPH_REPORT.md is one page. The raw folder might be gigabytes.

The Architecture (The Part Worth Understanding)

Graphify runs two passes. The first is deterministic — a tree-sitter AST parse across 19 programming languages (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Objective-C). No LLM involved, just structure. The second pass runs Claude subagents in parallel over docs, papers, and images — fully multimodal via Claude vision.

The clustering step is where it gets interesting: Leiden community detection, not embeddings. No vector database. The graph topology itself is the similarity signal. If two nodes are densely cross-linked, they’re in the same community. This means no embedding drift, no cosine similarity quirks, no infrastructure to run. The graph structure is the index.

Every relationship gets a provenance tag: EXTRACTED (found directly in the source), INFERRED (reasonable inference with a confidence score), or AMBIGUOUS (flagged for human review). You know what you’re working with.

SHA256 caching means re-runs only touch changed files. Big folders stay fast.

Install and Run

# Install (PyPI name is graphifyy while graphify name is being reclaimed)
pip install graphifyy && graphify install

# Basic usage
/graphify .                        # current directory
/graphify ./raw                    # Karpathy's folder, or yours
/graphify ./raw --mode deep        # more aggressive inference
/graphify ./raw --update           # only re-process changed files
/graphify ./raw --obsidian         # also generate Obsidian vault with backlinks

Platform installs:

pip install graphifyy && graphify install                        # Claude Code (Linux/Mac)
pip install graphifyy && graphify install --platform codex       # Codex
pip install graphifyy && graphify install --platform claw        # OpenClaw
pip install graphifyy && graphify install --platform opencode    # OpenCode
pip install graphifyy && graphify install --platform droid       # Factory Droid

Codex users: use $graphify . instead of /graphify .

The Always-On Hook

This is the UX detail that separates Graphify from a demo. Run:

graphify claude install    # Claude Code
graphify claw install      # OpenClaw

For Claude Code, this writes a PreToolUse hook to settings.json. Before every Glob and Grep, Claude sees: “Knowledge graph exists. Read GRAPH_REPORT.md before searching raw files.” Claude navigates by graph structure instead of keyword grepping. For Codex and OpenClaw, it writes the equivalent instruction into AGENTS.md.

The result is an assistant that actually knows your codebase topology rather than one that searches for strings and hopes.

Add URLs, Query the Graph

# Ingest external sources
/graphify add https://arxiv.org/abs/1706.03762     # fetch a paper into your graph
/graphify add https://x.com/karpathy/status/...    # fetch a tweet

# Query
/graphify query "what connects attention to the optimizer?"
/graphify path "DigestAuth" "Response"
/graphify explain "SwinTransformer"

# Export formats
/graphify ./raw --wiki             # agent-crawlable wiki
/graphify ./raw --neo4j            # Cypher for Neo4j
/graphify ./raw --graphml          # GraphML for Gephi/yEd

GitHub: github.com/safishamsi/graphify (MIT) PyPI: pip install graphifyy


FAQ

What is Graphify? An AI coding assistant skill — one command turns any folder into a queryable knowledge graph that works inside Claude Code, Codex, OpenClaw, OpenCode, and Factory Droid.

How do I install Graphify? pip install graphifyy && graphify install, then type /graphify . in Claude Code or your preferred assistant.

What does 71.5x fewer tokens mean? graph.json is a structural compression of your folder. The LLM traverses relationships rather than reading raw file content — so a multi-gigabyte research folder becomes a one-page audit plus a compact graph.

Does Graphify work with OpenClaw? Yes. Run pip install graphifyy && graphify install --platform claw, then use /graphify in OpenClaw as normal.

What file types does Graphify support? 19 programming languages via tree-sitter AST, plus PDFs, markdown, images, screenshots, and whiteboard photos via Claude vision.

What’s the Karpathy connection? Karpathy posted asking for a tool to query his /raw folder of papers, tweets, and notes without reading every file. Graphify shipped 48 hours later, built to solve exactly that problem.

Does Graphify use embeddings or a vector database? No. It uses Leiden community detection on graph topology. The link structure between nodes is the similarity signal — no vectors, no drift, no infrastructure dependency.