Developer Tools

AI Costs Slashed: Burnless Cuts LLM API Bills by 90%

Everyone building AI agents knows the crushing cost of multi-turn conversations. Now, a new open-source project promises to slash those bills by 90%.

Diagram illustrating the O(N^2) vs O(N) cost curves for LLM API calls.

Key Takeaways

  • Burnless reduces multi-turn LLM API costs by up to 90% by changing the cost model from quadratic O(N²) to linear O(N).
  • It achieves this through a Shared Prefix Cache for system prompts and a 'Capsule History' that compresses prior turns.
  • The protocol is vendor-agnostic, allowing users to mix and match LLM providers and local models.
  • A 10-turn session benchmark shows Burnless costs $0.45 compared to $4.66 for a naive implementation with Claude 3 Opus.

Here’s the thing. Everyone building anything remotely resembling a helpful AI agent knows the pain. It’s not just the tokens you burn on a single query; it’s the endless, recursive cost of every subsequent turn. The thinking was simple: replay the whole damned conversation, every time. This naturally led to quadratic costs. Your API bill, my friends, was spiraling out of control. We’re talking about a situation where a single day’s work could obliterate your entire monthly budget on models like Claude Opus. A real wall, hit hard.

This is where Burnless waltzes in, looking like it just solved world hunger with a few lines of Python. It’s an open protocol, an orchestration layer, and crucially, it flips that nasty O(N²) cost curve to a sweet, sweet O(N). The math isn’t magic; it’s just smart. They claim a 16x reduction in real-world API consumption. Ninety percent cheaper. Let that sink in.

The Quadratic Nightmare

So, the existing paradigm for multi-turn agent loops? It’s a cost disaster. Each new turn means re-transmitting the entire prior conversation history. If turn N costs proportional to N tokens, then the total cost across N turns just balloons to Θ(N²). It’s like paying for every single word you’ve ever spoken, not just the new sentence. Utterly insane for anything beyond trivial chat.

Burnless: The O(N) Lifeline

Burnless bills itself as a vendor-agnostic orchestration layer. The idea is you pick a “Maestro” model – could be Claude, GPT, Gemini, even a local Llama – to orchestrate everything, and then you have “Workers” for specific tasks. These aren’t vendor-specific tiers, mind you. They’re quality/cost bands: gold, silver, bronze. You map these to whatever command-line interface you’ve got. Local Ollama model for zero marginal cost on simpler tasks? Sure. Mix and match providers? Absolutely.

But the real kicker is how they collapse that quadratic cost. Two key mechanisms are at play here. First, a Shared Prefix Cache. That massive system prompt, potentially 20,000+ tokens, gets cached. If you stick with the same provider, switching models mid-session doesn’t invalidate it if the prefix is identical. Second, Capsule History. Instead of storing raw transcripts in the agent’s memory, the Maestro model only retains these tiny, ~80-character compressed “capsules” of previous turns. So, your quadratic history term shrinks to a tiny linear one. Massive system prompt billed at cache-read prices. It’s the sensible way forward.

The result is that your quadratic history term collapses into a tiny linear one, while the massive system prompt is billed at cache-read prices (which is roughly 10x cheaper than fresh input on Anthropic).

They even provide a reproducible benchmark using the Anthropic SDK. For a 10-turn session with Claude 3 Opus:

  • Standalone (no cache): $4.66
  • Standalone (+ cache): $0.65
  • Burnless Maestro: $0.45 (-90.3%)

This math, they argue, applies to any provider that offers prompt caching and charges per input token. It’s a universal solution to a universal problem. And the setup is… well, Pip install and setup. Easy. This isn’t just a wrapper; it’s a fundamental architectural shift in how we build LLM applications.

Vendor Agnosticism is King

The beauty here, beyond the cost savings, is the commitment to being vendor-agnostic. The config.yaml example is telling. You can literally drop in your existing CLI commands. Want to use local models for cheaper tasks? Done. Want to use a specific provider for the heavy lifting? Also done. The ability to mix and match is crucial for optimization. It frees developers from being locked into a single vendor’s ecosystem. This is the kind of modularity we should be championing in the AI space. It’s what enables innovation and prevents vendor lock-in.

Is This the Future of AI Agents?

This feels like a necessary evolution. The O(N²) cost structure was a clear dead end for any agent needing to maintain context over multiple turns. Burnless offers a pragmatic solution. It’s built on existing tech – caching, summarization techniques – but elegantly applied to the LLM agent problem. The MIT license ensures it’s open for anyone to use and contribute. This isn’t just about saving a few bucks; it’s about enabling more complex, more capable AI agents to be built without breaking the bank. The implications for research, for smaller businesses, and for individual developers are significant.



🧬 Related Insights

Frequently Asked Questions

What does Burnless actually do? Burnless is an open-source orchestration layer that optimizes multi-turn LLM agent conversations to drastically reduce API costs. It transforms quadratic O(N²) costs to linear O(N) by caching system prompts and compressing conversation history.

Will this replace my current LLM API calls? Burnless doesn’t replace your LLM API calls directly; it manages and optimizes them. You still use your chosen LLM providers, but Burnless orchestrates the interaction to minimize token usage and cost.

Is Burnless free to use? The Burnless software itself is open-source and free to use under the MIT license. However, you will still incur costs from the LLM API providers you choose to use with Burnless, though these costs are significantly reduced.

Written by
Open Source Beat Editorial Team

Curated insights, explainers, and analysis from the editorial team.

Frequently asked questions

What does Burnless actually do?
Burnless is an open-source orchestration layer that optimizes multi-turn LLM agent conversations to drastically reduce <a href="/tag/api-costs/">API costs</a>. It transforms quadratic O(N²) costs to linear O(N) by caching system prompts and compressing conversation history.
Will this replace my current LLM API calls?
Burnless doesn't replace your LLM API calls directly; it manages and optimizes them. You still use your chosen LLM providers, but Burnless orchestrates the interaction to minimize token usage and cost.
Is Burnless free to use?
The Burnless software itself is open-source and free to use under the MIT license. However, you will still incur costs from the LLM API providers you choose to use with Burnless, though these costs are significantly reduced.

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from Open Source Beat, delivered once a week.