vs. Anthropic native
Short version: they solve different parts of the same problem, and they compose. Anthropic’s native features decide what the model remembers and when the server trims; trimwire decides what bytes leave your machine, deterministically, with the cost made visible — before the request is ever sent. You can run both at once, and the defaults are designed not to fight each other.
This page is factual and deliberately conservative: native behaviour evolves and is partly version-/beta-gated, so where exact behaviour depends on your client or API version it’s described as “when enabled” rather than pinned to a number. Check the current Anthropic / Claude Code docs for specifics.
The native features (what they are)
Section titled “The native features (what they are)”- Prompt caching — Anthropic caches a byte-exact request prefix (
tools→system→messages) so repeated prefixes are billed at a large discount. Any byte change in the prefix invalidates everything after it. - Context editing /
clear_tool_uses(thecontext-management-*beta) — server-side: as a conversation approaches the context limit, Anthropic can clear older tool-use/tool-result content automatically, replacing it with a marker. - Claude Code
/compact(and auto-compact) — client-side: Claude Code summarizes the conversation so far into a shorter form and continues from the summary. User-triggered, or automatic near a threshold. - The memory tool — a model-driven, file-backed scratch memory the model reads and writes across turns. A different concern (what the model keeps), not wire pruning.
What trimwire is
Section titled “What trimwire is”A single static binary that sits on ANTHROPIC_BASE_URL as a transparent HTTP
gateway and prunes messages[] in flight, per request, with deterministic
model-free strategies as the floor. It records what it changed (trimwire stats,
cache-hit accounting) and never makes a model call on the default path. An opt-in
summarizer can do heavier reduction but is never load-bearing — every strategy
fails open to the original body.
Side by side
Section titled “Side by side”| Dimension | Anthropic-native | trimwire |
|---|---|---|
| Where it runs | Server-side (context editing) / inside Claude Code (/compact) | A local proxy on your machine, before the request leaves |
| Determinism | Heuristic / model-driven; output can vary | Deterministic, byte-for-byte; same input → same output |
| Transparency | Trims happen upstream; limited local visibility | You see exactly what changed + the spend impact (trimwire stats) |
| Trigger | At/near the context limit, or on /compact | Every request, proactively — bytes are trimmed before they accumulate |
| Loss profile | /compact is lossy summarization; context-editing drops old tool output | Default path keeps recent turns verbatim; trims stale/duplicated/oversized tool output with markers; summarizer is opt-in |
| Control | A beta flag / a slash command; little per-strategy tuning | Per-strategy config, protected file globs, thresholds, off-by-default levers |
| Cache awareness | Caching is the native mechanism | Pruning is designed to preserve the cache prefix; its stateful re-pruning is cache-stable on purpose |
| Portability | Anthropic API (and Claude Code) | Any Anthropic-compatible endpoint; no beta header required |
| Dependencies | Built in | One binary; no CA cert, no model required on the default path |
How they compose
Section titled “How they compose”- Caching: trimwire’s whole design is built around not breaking the cache — it preserves the byte-exact prefix and its stateful re-pruning exists precisely to keep the cache stable across turns. It complements caching rather than competing.
- Context editing /
/compact: these fire late (near the limit). trimwire works early and every turn, so there’s less bloat for the native path to clear in the first place — and when the native path does fire, trimwire forwards it untouched (it already accounts for the “tool result cleared” markers on the wire). - Memory tool: orthogonal. trimwire prunes wire bytes; the memory tool manages what the model deliberately retains. Running both is fine.
When native alone is enough
Section titled “When native alone is enough”- You don’t care about seeing per-request cost/byte impact and the built-in trims keep you under the limit comfortably.
- You’re happy with lossy summarization at the threshold and don’t need recent turns kept verbatim.
- You don’t want to run any local process.
Where trimwire adds something native doesn’t
Section titled “Where trimwire adds something native doesn’t”- Spend transparency — concrete, per-session numbers for what was trimmed and how it moved cache-hit rate. The native path doesn’t surface this locally.
- Determinism — reproducible pruning you can reason about and test, not a heuristic that varies run to run.
- Proactive, every-turn control — trim before bloat accumulates, rather than only when the limit is hit.
- Tool-output control — cap/stub stale or oversized tool results, protect specific files from pruning, all without a model call.
- Portability — the same behaviour against any Anthropic-compatible endpoint, no beta header.
One-line pitch
Section titled “One-line pitch”Anthropic-native context management decides what the model keeps and when the server trims. trimwire is the transparent, deterministic, spend-visible layer that controls what leaves your machine — and it’s built to keep the cache intact while it does. Use both.