- opencode’s edit tool doesn’t trust LLMs to give exact string matches. they run a cascade of “replacers” - starting from exact match, then line-trimmed, then block-anchor, whitespace-normalized, indentation-flexible, etc. the
BlockAnchorReplaceris clever - matches first/last lines exactly, then uses levenshtein distance to fuzzy-match the middle. handles common LLM failure modes: wrong indentation, whitespace diffs, escape sequence issues. sourced from cline and gemini-cli. - opencode uses vercel ai sdk to abstract provider-specific tool call formats. internally they model messages as “parts” -
TextPart,ReasoningPart,ToolPart,StepStartPart,StepFinishPart. eachToolParthas a state machine (pending→running→completed/error) so UI can show progress. tools defined with zod schemas that get converted to provider-native formats. nice separation - the edit tool logic (cascade of replacers) is completely decoupled from the LLM plumbing. EditTool schema:z.object({ filePath: z.string().describe("The absolute path to the file to modify"), oldString: z.string().describe("The text to replace"), newString: z.string().describe("The text to replace it with (must be different from oldString)"), replaceAll: z.boolean().optional().describe("Replace all occurrences of oldString (default false)"), }) - fast apply models are a different approach: instead of fuzzy-matching diffs, use a small specialized model to merge LLM outputs into original files. cursor pioneered this, achieving ~1000 tok/s via speculative decoding (most tokens are just copied from original, only edit boundaries need generation). relace open-sourced their training methodology for Apply 3 (~10k tok/s):
- dataset: synthetic (original, lazy-diff, merged) triplets from real codebases - train the model to handle LLM “laziness” patterns like
// ... rest unchanged - training: fine-tune small models (7B-14B) on the merge task; model learns to identify edit locations and preserve unchanged code exactly
- inference: speculative decoding with high acceptance rate since edits are sparse (98% of tokens copied verbatim)
- dataset: synthetic (original, lazy-diff, merged) triplets from real codebases - train the model to handle LLM “laziness” patterns like
- aider has the most extensive benchmarking on edit formats. key finding: unified diffs reduce GPT-4 Turbo laziness by 3x (20% → 61% on their laziness benchmark). the insight is that unified diffs make the model act like it’s “writing data for a program” rather than “explaining to a human” - it avoids informal placeholders like
// ... rest unchanged. four principles: (1) familiarity -git diffformat is everywhere in training data, (2) simplicity - plain text avoids JSON escaping bugs, (3) high-level edits - encourage substantial blocks not surgical single-lines (30-50% error reduction), (4) flexible application - fuzzy matching is critical, disabling it increased errors 9x. no single best format though - GPT-4 Turbo prefersudiff, Claude prefersdiff(search/replace), Gemini preferswholeordiff-fenced.