As the current benchmark study suggests, the real savings in AI usage aren't found in the compression margin. This idea may already exist in pieces, but not with the scope this study brings to it.

Well, one thing has become clear in this new AI era: a new economic model has formed, and it points in a clear direction. Tokens are here to stay, and they're already part of daily life. Just as we are learning to measure a new pace of work delivery under AI acceleration, we are learning to measure budgets by token burn.

Late last year I lived this firsthand: massive token burn across my team. And a paradox you might recognize: the cheaper the models got per token, the higher the total bill climbed. Better, faster models invite more use, and consumption grows faster than prices fall. Budgets kept rising instead of stabilizing.

I tried everything to cut costs. Last of all, I went after data compression, and here I had home-field advantage: I come from building bots for predictive markets, and squeezing data has been my job for years. To my surprise, the experience was bad. What I found compressed text as text, in the style of the academic state of the art (LLMLingua and successors): drop the tokens with the lowest statistical weight. That works for prose. It fails exactly on what an agent eats all day: logs, SQL schemas, diffs, stack traces, test output, API responses. The same goes for generic compression. And nothing I saw measured coverage, which is the real driver of savings. For a team writing code 24 hours a day, it was useless.

I started working alone on an architecture project. And I discovered that compression wasn't the trigger of real savings. The trigger came from properly mapping and tracking the data, refined logic work, with savings ceilings that depend on usage style. In other words: real savings live in data coverage, not in a pretty compression ratio. My compressions reach 99.9% savings margin. But how much of that actually comes off my bill?

The intuition is simple. Good mapping hands the AI the data it needs to keep working with the minimum of tokens. But that AI is fed by hundreds of different sources, and the more you fence it in and feed it a disciplined diet, the more agile and economical it works. It doesn't just save money: it gets faster and more precise, because model attention is finite, and a clean context is a more qualified context.

But there's a solid mathematical logic here. The AI spends tokens per API call, and an isolated call is tokens burned at full price. Calls in a large group can generate savings, but only if some of the initial calls create familiarity with the rest, meaning compressing a piece of data early on that will be obsessively re-read throughout all the remaining calls. That's how you save tokens.

But the concept of savings doesn't come only from understanding how to avoid isolating calls and how to create families. We'll see this further on.

Run	Tokens processed	Margin
Official published benchmark	180,322,482	87.45%
F4 battery, usage-weighted	127,586,488	88.44%
Wild via production hook	20,227,044	91.97%
100M · v0.5.32	96,626,712	95.56%
200M · v0.5.33 (current)	202,021,713	95.42%

01	1M context only on dense tasks.For simpler tasks, use smaller contexts.
02	Sanitize completed tasks.Accumulating junk distorts your numbers.
03	Central planning session.Split into epics and tasks per epic, and open new sessions for each task.
04	Use the capsules.The most direct route to optimizing token spend.
05	Intelligent router on simple tasks.Reserve high-intelligence models for what actually demands them.
06	Don't delegate delicate tasks to weaker models.Avoids heavy rework. Cheap can come out expensive.

Capsule	Class	Tokens processed	Tokens saved	Margin
log	algorithmic	93,336,248	92,478,692	99.1%
codebase	algorithmic	68,100,699	57,771,407	84.8%
diff	algorithmic	63,931,733	51,418,235	80.4%
api	algorithmic	55,406,079	55,329,455	99.9%
prompt	algorithmic	49,918,938	49,571,251	99.3%
rag	llm	42,674,288	39,067,940	91.5%
build	algorithmic	39,381,798	38,577,940	98.0%
test	algorithmic	36,577,472	33,486,726	91.6%
schema	algorithmic	28,994,463	16,818,449	58.0%
network	algorithmic	28,443,603	28,257,309	99.3%
apispec	algorithmic	22,007,114	18,966,567	86.2%
stack	llm	21,944,586	20,138,460	91.8%
threads	llm	20,308,235	18,773,742	92.4%
events	llm	19,003,173	18,769,332	98.8%
sql	llm	15,555,985	15,260,167	98.1%
pdf	llm	15,276,099	13,734,220	89.9%
session^†	retired	3,901,389	3,825,574	98.1%
image	algorithmic	2,017,139	2,002,654	99.3%
TOTAL		626,779,041	574,248,120	91.62%

626 million audited tokens: AI savings aren't where the market is measuring

§ 01How this became engineering

The continuous architect

§ 02The benchmarks, open for audit

§ 03The 7 discoveries the benchmarks forced on me

Margin is a vanity metric; coverage is the bill.

Compression compounds.

The savings number doesn't come from compressing, but from mapping.

Green numbers lie; only the task tells the truth.

The first compressed context and the data chase are the key to savings.

The user's profile and organization are the clearest key to savings.

Not everything needs a high-intelligence model for execution.

§ 04The results for my team

Jevons' Paradox applied to tokens

§ 05Summary of what I found in the studies

§ 06Simple organization protocols

§ 07Test and measure it yourself

§ 08Aggregate per-capsule distribution

Don't believe — measure.