agent harness notes

opencode’s edit tool doesn’t trust LLMs to give exact string matches. they run a cascade of “replacers” - starting from exact match, then line-trimmed, then block-anchor, whitespace-normalized, indentation-flexible, etc. the BlockAnchorReplacer is clever - matches first/last lines exactly, then uses levenshtein distance to fuzzy-match the middle. handles common LLM failure modes: wrong indentation, whitespace diffs, escape sequence issues. sourced from cline and gemini-cli.
opencode uses vercel ai sdk to abstract provider-specific tool call formats. internally they model messages as “parts” - TextPart, ReasoningPart, ToolPart, StepStartPart, StepFinishPart. each ToolPart has a state machine (pending → running → completed/error) so UI can show progress. tools defined with zod schemas that get converted to provider-native formats. nice separation - the edit tool logic (cascade of replacers) is completely decoupled from the LLM plumbing. EditTool schema:
```
z.object({
  filePath: z.string().describe("The absolute path to the file to modify"),
  oldString: z.string().describe("The text to replace"),
  newString: z.string().describe("The text to replace it with (must be different from oldString)"),
  replaceAll: z.boolean().optional().describe("Replace all occurrences of oldString (default false)"),
})
```
fast apply models are a different approach: instead of fuzzy-matching diffs, use a small specialized model to merge LLM outputs into original files. cursor pioneered this, achieving ~1000 tok/s via speculative decoding (most tokens are just copied from original, only edit boundaries need generation). relace open-sourced their training methodology for Apply 3 (~10k tok/s):
- dataset: synthetic (original, lazy-diff, merged) triplets from real codebases - train the model to handle LLM “laziness” patterns like // ... rest unchanged
- training: fine-tune small models (7B-14B) on the merge task; model learns to identify edit locations and preserve unchanged code exactly
- inference: speculative decoding with high acceptance rate since edits are sparse (98% of tokens copied verbatim)
aider has the most extensive benchmarking on edit formats. key finding: unified diffs reduce GPT-4 Turbo laziness by 3x (20% → 61% on their laziness benchmark). the insight is that unified diffs make the model act like it’s “writing data for a program” rather than “explaining to a human” - it avoids informal placeholders like // ... rest unchanged. four principles: (1) familiarity - git diff format is everywhere in training data, (2) simplicity - plain text avoids JSON escaping bugs, (3) high-level edits - encourage substantial blocks not surgical single-lines (30-50% error reduction), (4) flexible application - fuzzy matching is critical, disabling it increased errors 9x. no single best format though - GPT-4 Turbo prefers udiff, Claude prefers diff (search/replace), Gemini prefers whole or diff-fenced.