docs as packages

i’ve been using Claude Code heavily for the last few months — for chonkie, for side projects, for basically everything. and the single biggest source of bad code is stale documentation.

the agent writes something that looks right. the imports are plausible. the function signatures almost match. but it doesn’t compile, because the API changed two versions ago and the agent is working from training data. or it does a web fetch, gets blocked by Cloudflare, retries three times, finally scrapes a page that’s 95% navigation chrome, and extracts 300 tokens of actual content from 14,000 tokens of HTML. or it finds the docs but for the wrong version — Next.js 16 patterns when the project is pinned to 14.

i started looking at how agents actually get documentation, and every approach has the same shape: it kind of works, until it doesn’t.

how agents get documentation today

direct web fetching

the simplest approach. the agent searches the web, gets URLs, fetches pages.

Agent: I need to check the API for FastAPI's Depends function.
> web_search("FastAPI Depends injection")
> fetch_url("https://fastapi.tiangolo.com/tutorial/dependencies/")
  ⚠ Blocked by Cloudflare bot detection
> fetch_url("https://stackoverflow.com/questions/...")
  ✓ 200 OK — but this is a 2022 answer about an older API
> fetch_url("https://fastapi.tiangolo.com/reference/dependencies/")
  ✓ 200 OK — 14,200 tokens of HTML, ~300 tokens of relevant content

a single lookup can burn 5-10 HTTP requests and thousands of tokens of HTML to extract a few lines of useful content. the page with the function signature you need is roughly 95% noise — nav bars, sidebars, footers, cookie banners. all of it authored for humans, not agents.

and it gets worse at scale. millions of agents hitting documentation sites with rapid-fire requests looks a lot like a DDoS. site operators tighten bot detection, agents fail more, agents retry more. a feedback loop that makes things worse for everyone.

llms.txt

a newer convention. site owners place an /llms.txt or /llms-full.txt at their domain root with clean, machine-readable content. no HTML chrome. just the docs.

Agent: I need docs for Tailwind's grid utilities.
> fetch_url("https://tailwindcss.com/llms-full.txt")
  ✓ 200 OK — 428,000 tokens
  [Loading entire Tailwind documentation into context...]

solves the noise problem. but it’s a full dump — no search index, no relevance ranking, no way to get just the section you need. Tailwind’s llms-full.txt is 428k tokens. PyTorch would be worse. an agent looking up one function signature has to ingest the entire file.

no versioning either. /llms.txt serves whatever’s currently deployed. a developer on Next.js 14 hits nextjs.org/llms.txt and gets the 16 docs. there’s no convention for versioned paths and most sites don’t bother.

llms.txt is the right idea — give agents content in a format they can use — but it stops short of solving retrieval, versioning, and offline access.

cloud MCP documentation servers

services like Context7 and Docfork pre-index library documentation and serve it via MCP. agents query a centralized server, get back structured snippets. Context7 has over 42k GitHub stars — clearly this resonated.

Agent: I need to look up Next.js middleware API.
> mcp__context7__search_docs("nextjs", "middleware")
  ✓ Returns 3 relevant documentation snippets (~2,000 tokens)

when it works, the experience is great. but the model breaks in two ways.

economics. these services index open source documentation — content authors wrote and published freely — and serve it behind rate limits. Context7’s free tier went from ~6,000 requests/month to 500 in January 2026. at 50-100 lookups per coding session, that’s maybe 10 sessions a month.

the rate limits aren’t arbitrary — they’re a consequence of the architecture. every query hits a server, searches an index, returns a response. the infrastructure has to handle every agent session’s burst of requests. but documentation is static content. it changes when a library ships a new version. between releases, every query returns the same thing. serving it through a real-time API is like running a database query for a paragraph that hasn’t changed in three months.

versioning. Next.js’s API surface changed meaningfully between versions 13, 14, 15, and 16. a centralized server has two options, both bad. index every version, and the search results are polluted with near-identical content from different versions that BM25 can’t disambiguate. or index only the latest version, and agents return v16 docs when the project is pinned to v14. the failure mode — code that almost compiles — is worse than an obviously wrong answer.

mandex: documentation as packages

all three approaches treat documentation as something to be fetched on demand. the architectures differ, but the access pattern is the same: agent needs info, makes a network request, waits for a response. the problems — rate limits, version mismatches, context bloat — are consequences of this model.

but documentation isn’t dynamic content. a library’s docs are written once per release and read by thousands of developers between releases. authored once per version, distributed widely, read many times, never modified in place.

that’s the same access pattern as software packages. you don’t query npm on every import statement. you install packages locally and they’re available immediately. documentation should work the same way.

mandex is a package registry for documentation. library authors build searchable documentation packages from their existing docs. the packages are compressed and distributed through a CDN. developers download them once and query them locally.

mx pull pytorch @2.3.0
mx pull nextjs @14.0.0
mx search pytorch "attention mechanism"

after the initial download, all queries are local. no network call, no server process, no rate limit, no API key. the same query can run a thousand times in a session at the same cost: zero.

versioning is solved by the package model itself. mx pull nextjs @14.0.0 downloads a package containing only the Next.js 14 documentation. the search index has no v16 content to confuse results. the right version was selected at download time.

the CLI outputs to stdout — pipe it, redirect it, or let an agent invoke it as a tool. works with Claude Code, Cursor, Copilot, or anything that can run a shell command. mx serve starts an MCP server if you prefer that protocol, but it’s a transport layer on top, not a requirement.

for library authors

the build step works on documentation as it already exists. no custom format, no migration.

mx build ./docs --name pytorch --version 2.3.0

walks the directory, finds every markdown and MDX file, creates one entry per file. first # heading becomes the entry name, full content becomes the body. FTS5 index built over both columns. compatible with Docusaurus, MkDocs, Mintlify, plain README collections — mx build operates on the common denominator.

publishing can be added to release CI alongside npm publish or twine upload. the CDN handles distribution. the CLI handles search. the author’s only job is the one they already have: writing good docs.

with scraping-based tools, the author has no say in how their documentation is indexed or chunked. with mandex, the author controls the source material — or just ships their existing docs and lets the format do its job.

architecture

package format

a mandex package is a zstd-compressed SQLite database with FTS5 full-text search.

the schema is deliberately minimal:

CREATE TABLE entries (
    id      INTEGER PRIMARY KEY AUTOINCREMENT,
    name    TEXT NOT NULL,
    content TEXT NOT NULL
);
 
CREATE VIRTUAL TABLE entries_fts USING fts5(
    name,
    content,
    content=entries,
    content_rowid=id,
    tokenize='porter unicode61'
);

two columns. name from the first heading in the source file. content is the full markdown.

no params, no signature, no returns or tags or kind. documentation already contains all of that — function signatures, type annotations, examples — in markdown that LLMs parse naturally. a structured schema would need to work across PyTorch API references, Next.js conceptual guides, Tailwind utility listings, and Django tutorials. no field set fits all of them. markdown does.

why SQLite

the package format needs to be three things at once: storage container, search index, and query engine. SQLite does all three in a single file with no server process.

FTS5 gives you BM25 relevance ranking, porter stemming, prefix queries, phrase matching, and boolean operators — built into SQLite, not a separate library. a mandex .db file is queryable with any SQLite client in any language. Python’s sqlite3, Rust’s rusqlite, the sqlite3 CLI.

there’s a nice side effect: if mandex as a project ceased to exist, every published package would still work. the files are standard SQLite databases. no proprietary encoding, no format-specific decoder. SQLite has backwards compatibility commitments through 2050.

download and sync

packages are compressed with zstd — documentation is highly compressible, so a 20MB package ships as 2-4MB. each download is an HTTP GET to a CDN edge node. no auth, no API server, no query processing.

packages live in a global cache (~/.mandex/cache/), shared across projects. two projects using react @19.1.0 download it once. same model as pnpm’s content-addressable store.

the per-project behavior comes from mx sync. it reads your dependency files — package.json, requirements.txt, Cargo.toml, pyproject.toml, go.mod — resolves each to a mandex package, downloads what’s missing, and writes a project-local manifest.

$ mx sync
  Reading package.json...
  Resolved 14 dependencies to mandex packages
  ↓ react @19.1.0          2.1 MB  [===========] done
  ↓ next @14.2.0           4.7 MB  [===========] done
  ↓ tailwindcss @4.1.0     1.8 MB  [===========] done
  ↓ tanstack-query @5.0   1.2 MB  [===========] done
  ... (10 more)
  Synced 14 packages in 1.4s

when mx search runs inside a project, it queries only the packages in that project’s manifest. 50 packages in your global cache, but a search in your Next.js project only hits the 14 that matter.

a note on sub-agents

search has a ceiling. FTS5 can surface the right sections, but it can’t synthesize across them — and for complex questions, the answer often lives across three or four entries.

why not use better search? embedding models, rerankers, LLM-based query expansion would all beat BM25. but they need significant memory, compute, or an API key. a good embedding model takes 500MB of RAM. that’s not appropriate for a CLI tool that should start in 5ms.

sub-agents change this. Claude Code can run four or five mx search calls in parallel, read the results, and reason across them. the sub-agent does the synthesis; mandex provides the raw material. the LLM doing the synthesis is already running, already paid for, and already has the context to know which results matter.

this is why mandex doesn’t need perfect search ranking. even imperfect BM25 results over local, version-pinned documentation beat asking an agent to hallucinate from stale training data.

getting started

mandex is written in Rust. single static binary called mx.

curl -fsSL https://mandex.dev/install.sh | sh

the installer detects your platform, drops the binary in ~/.local/bin, configures your PATH, then runs mx init — which auto-detects your AI coding tools (Claude Code, Cursor, Windsurf, Codex) and installs skills for each one. one command, everything wired up.

cd your-project
mx sync               # installs docs for all detected dependencies
mx search nextjs "middleware"

the format is SQLite. the packages are portable. the source is open. everything works offline after the initial download.

agents need documentation to write correct code. the documentation exists. the missing piece was never the content — it was the distribution model. package it, version it, distribute it through infrastructure that scales to zero marginal cost. then let agents query it locally, as many times as they need, without asking anyone’s permission.