Skip to content

Performance

trimwire optimizes for clean context — window headroom, not your bill. These numbers come from offline replay: deterministic /v1/messages bodies fed through the real strategy code under the shipped default config. Byte figures are exact and reproducible; cost/token figures are estimates; timing is host-dependent.

0–99% lighter

Request size shrinks by session shape — nothing when there’s no redundancy, a lot on long tool-heavy sessions. trimwire never invents savings.

Headroom, not money

The reliable win is context-window headroom. Net cost is non-monotonic under prompt caching: a wash-to-slight-loss on short sessions, ≈ −52% at 256 turns. We report headroom and refuse to quote a dollar figure.

Sub-2 ms overhead

Pruning runs off the network critical path; the streamed response is forwarded byte-for-byte, unbuffered.

Safe on every corpus

Orphan-free (never splits a tool call↔result pair), system and tools[] untouched, validated across the corpus set plus a 3,000-body fuzz.

Prompt caching makes net cost non-monotonic: pruning can bust a cached prefix and cost more on a short session, while saving substantially on a long one. Quoting a single ”% saved on your bill” would be dishonest. What trimwire reliably does is keep the request focused — the share that is the recent window you’re actually working on — and low-redundancy — less repeated tool output crowding the model. On a redundancy-heavy corpus, focus climbs (e.g. 41.6% → 67.2%) purely by shedding dead weight.

  • trimwire preview <session.jsonl> — replay the strategies over one of your own recorded transcripts, read-only, no network, and see exactly what would trim.
  • The full methodology, every corpus, and the cost model live in the repo: benchmark/results/RESULTS.md.