Extended thinking
Models that reason before answering — handle the signatures correctly or break your cache.
Twelve cross-cutting ideas — agent loops, memory compression, prompt caching, sandboxing — each followed across the 18 codebases. Patterns you'll see again and again, with a project-by-project breakdown of the variations.
Come here when you're trying to understand a technique and want to see how five different teams attacked it.
Start with Agent loop — the foundation everything else attaches to
Models that reason before answering — handle the signatures correctly or break your cache.
The skeleton every agent shares — read state, ask the model, parse, act, repeat — and how the wiring choices shape every other system around it.
Pay 10–50% of input cost on cached tokens. The art is choosing where the static-vs-dynamic boundary lives.
Long sessions overflow the context window. The good implementations don't summarize — they enumerate what to keep.
When one agent isn't enough — three questions to answer for any review pipeline, planner-executor flow, or critic loop.
How agents talk to each other when they share a process — and why most teams reach for Redis when a Python dict would do.
Move agent expertise out of code and into versionable markdown. Domain experts contribute via PR; the loop almost never changes.
Limit what the tool layer can do regardless of what the agent intends. Docker, firewall, process limits, or all three.
Layered defenses — prompt, schema, controller, sandbox — each catching a different class of failure. The story you tell auditors.
Decouple agent code from a single LLM vendor. Three patterns observed; each pays a different tax.
Don't wait for the full response. Parse tool calls as they stream and dispatch the moment you have enough — sometimes earlier.
How the agent and the model agree on 'I want to run this function.' Get this wrong and you're locked into one provider, can't stream cleanly, or pay for trailing hallucination.