01WHERE IT CAME FROM

NUXS was born from a bill that did not add up.

I needed to accelerate a critical project at one of the companies where I am a partner. The team was scattered, each person on their own AI tab, losing context and repeating work. I built a virtual pixel-art office where everyone worked together and agents became part of the environment — that became PixelDesk, today free on GitHub.

It worked beyond expectations. Within a few weeks AI became a shared utility of the company. Then came the bill: developers running Claude Code, Codex and Cursor in parallel, marketing with their own agents, operations automating. Plans blew up and direct API consumed absurd volumes.

Since I am from the field, I decided to solve it. I studied context compression and applied it to logs, code, API responses, stack traces, sessions, PDFs, video, audio, meetings. Each case required a different technique — that is the capsule concept. Twenty types later, spend dropped over 90% and the team kept productivity. NUXS did not come from market research. It came from a bill that did not add up.

02WHAT IT IS

A compression layer between you and the model.

Content that would normally enter the context whole passes through NUXS first, gets compressed into a dense representation, and only then reaches the model. The agent receives the informational equivalent. You pay a fraction.

Source interception — identifies content type and applies the corresponding capsule, locally, before it becomes a billed token.
Per-agent measurement — each compression recorded with type, tokens saved and opaque identifier, in real time, without content.

03CAPSULE

Each content type, its own engineering.

Generic compression works on free text and stops at the first JSON structure, the first log, the first source file. Structured content requires structural knowledge.

In logs, repeated patterns become template + counters. In code, a large file becomes a navigable structural index. In API, a huge response becomes a schema with samples. Half runs algorithmically — without calling a model, near-zero marginal cost. The choice between algorithm and semantics is automatic. To the user, transparent.

04TWO PATHS

Local installer or proxy.

Local agent users (Claude Code, Codex, Cursor, OpenClaw, Hermes) integrate via native hooks and MCP servers. Thirty seconds of setup, runs in the background, content never leaves the computer. Benefit: longer sessions within the same plan.

Direct API users point the base URL to NUXS, which operates as a proxy. Each call is compressed on input. Compatible with any major provider library, in any language. Benefit: sharp per-token savings, immediate and measurable.

The local installer is free (Free plan). The proxy is a paid-plan feature — requires an active license to route.

05PRIVACY

Content never leaves your machine.

Compression happens on the user machine. For algorithmic capsules nothing even reaches our servers. For semantic ones processing is transient with no retention.

Uploaded: capsule type, tokens saved, timestamp, opaque agent identifier.
Never uploaded: content, paths, identifiers, code, application data, names, URLs.
Server-side audit filter rejects any payload that looks like content. LGPD and GDPR by design.
Full on-prem available for enterprises with stricter requirements.

06INSTALL

Thirty seconds.

$ npm install -g nuxs-capsule
$ nuxs-capsule setup
# or, via proxy:
$ export ANTHROPIC_BASE_URL="https://proxy.nuxs.ai"

CLIENTMODE

Claude CodeHooks + MCP

Codex CLIHooks + MCP

CursorMCP

OpenClawMCP

HermesMCP

Any MCP clientMCP

Direct API applicationProxy

07TELEMETRY

How much each agent saved.

In companies scaling AI, the question shifts from "how much did I spend this month" to "which agent is spending more, on what, and why". Each compression is recorded with an opaque agent identifier — not the API key, not the user, the agent. Telemetry shows up in a dedicated panel (nuxs.ai/panel), with exportable history. For enterprises, private on-prem panel.

08FREE PLAN

Fifty million tokens, lifetime.

Not a trial. No deadline. No card required. No auto-conversion. Fifty million processed tokens, forever, with eleven active capsules — the entire text and code category. Enough for a dev to use through months of daily work before running out. Those who need the full set pay. Those who just want to test use for months at no cost.

09WHAT IT IS NOT

Category by category.

Not observability (Helicone, Portkey, Langfuse) — operates one layer earlier, reduces before sending. Complementary.
Not a routing gateway (OpenRouter, LiteLLM) — does not choose model, compresses input before the choice.
Not generic prompt compression (LLMLingua) — dedicated per-structured-type engineering, not statistical free-text analysis.
Not an AI model — does not compete with providers; it is the layer between you and any of them.

10WHAT REMAINS

The layer context passes through.

Everything entering a model passes through an input layer. Whoever operates that layer efficiently captures a structural fraction of all that flows.

Twenty capsules in production, more than one hundred million tokens processed in validation, free plan with fifty million lifetime for real testing. Not a product. A position.

NUXS · Context compression for the agent era.