Late last year I went through this firsthand: massive token burn on my team. And a paradox you might recognize. The cheaper tokens got, the higher the bill came in. Better, faster models invite heavier use, and consumption grows faster than prices fall. Budgets kept rising instead of leveling off.

I tried everything to cut costs. Eventually I looked into data-compression companies and GitHub repos, and here I had a home-field advantage: I come from building bots for prediction markets, and squeezing data is what I've done for years. To my surprise, it was disappointing. What I found compressed text as text, in the academic state-of-the-art style (LLMLingua and its successors): it drops the tokens with the lowest statistical weight. Works for prose. Fails on exactly what an agent eats all day, like logs, SQL schemas, diffs, stack traces, test output, and API responses. Generic compression fails the same way.

To be clear about where the cost comes from: an LLM generates spend on four sides of the bill, which are input, cache write, cache read, and output. Data compression never touches the output, one of the biggest bottlenecks. And in every compressor I looked at, I ran into the same thing. They sold a high compression ratio as if it were savings, with no clear study behind it. It isn't the same thing, and that gap was exactly what I wanted to measure.

Tired of all this, I started working on an architecture of my own. The idea was that the real savings trigger wasn't compressing more, but mapping and tracking correctly what passed through, with savings ceilings that depend on how each person uses it. My compressions reach 99.9% compression margin. But how much of that actually comes off my bill? That was the question that pulled the rest along.

The intuition behind it is simple. Good mapping gives the AI only what it needs to keep working, meaning the fewest tokens in the end. And it isn't only money. It gets faster and more accurate, because the model's attention is finite and a clean context goes further.

But compressing input wasn't enough. I had to look at the output too.

Run	Tokens processed	Margin
Official capsule benchmark (v0.1.57)	180,322,482	87.45%
F4 battery, usage-weighted	127,586,488	88.44%
Wild via production hook	20,227,044	91.97%
100M · v0.5.32	96,626,712	95.56%
200M · v0.5.33	202,021,713	95.42%
Squeeze 400M (v0.5.73)	400,020,422	80.8% effective
1B audited (cumulative)	1,026,804,861	—

01	1M context only on dense tasks.For simpler tasks, use smaller contexts.
02	Sanitize completed tasks.Accumulating junk distorts your numbers.
03	Central planning session.Split into epics and tasks per epic, and open new sessions for each task.
04	Use the capsules.The most direct route to optimizing token spend.
05	Intelligent router on simple tasks.Reserve high-intelligence models for what actually demands them.
06	Don't delegate delicate tasks to weaker models.Avoids heavy rework. Cheap can come out expensive.

Capsule / Track	Class	Tokens processed	Tokens saved	Margin / Effective
rag	llm	20,010,072	18,089,105	90.4%
log	algorithmic	16,063,645	15,870,881	98.8%
pdf	llm	12,023,183	11,542,256	96.0%
threads	llm	10,001,241	9,161,137	91.6%
events	llm	9,070,336	9,024,984	99.5%
prompt	algorithmic	8,012,934	7,996,908	99.8%
api	algorithmic	7,195,617	7,152,443	99.4%
sql	llm	5,010,920	4,970,833	99.2%
network	algorithmic	3,563,305	3,538,362	99.3%
stack	llm	2,004,372	1,813,957	90.5%
schema	algorithmic	2,000,200	1,540,154	77.0%
diff	algorithmic	1,537,461	1,452,901	94.5%
codebase	algorithmic	1,513,733	1,477,403	97.6%
build	algorithmic	1,013,132	972,607	96.0%
test	algorithmic	1,013,040	985,688	97.3%
apispec	algorithmic	708,552	614,315	86.7%
image^†	algorithmic	304,934	303,104	99.4%^†
Capsules — 5 cumulative runs		626,784,439	574,252,194	91.62% margin
Squeeze — 400M run		400,020,422	323,317,527	80.8% effective
TOTAL AUDITED		1,026,804,861	897,569,721	—

99% compressed, 1% on the bill: I audited 1B tokens to find out why.

§ 01How this became engineering

The continuous architect

§ 02The benchmarks, open for audit

§ 03The 8 points the benchmarks reveal

Margin is a vanity metric.

Compression compounds.

The savings number doesn't come from compressing, it comes from mapping.

A green number lies; only the task tells the truth.

The right accounting gives you room to map better.

The user's profile and organization are the clearest key to savings.

Not every execution needs a frontier model.

The 1% in the title: I compressed up to 99.9% and saved 1% on the bill.

§ 04How the system behaves in real use

The Jevons Paradox applied to tokens

§ 05Summary of what I found

§ 06Simple organization protocols

§ 07Test and measure it yourself

§ 08Total audited and per-capsule distribution

Don't believe — measure.