Naïve Engineering Lessons: How We Burned $450K in One Month

TL;DR

In our first month after launch we burned about $450K of avoidable spend: $130K on Fly.io, $320K on Anthropic.
Root cause: every signup (free, no credit card) got a dedicated, always-on micro-VM plus $10 of model credits.
We give each tenant a full VM because the agents run arbitrary, model-generated code and drive a headless browser. Co-tenancy is a security non-starter.
Running a VM per tenant quietly makes you an infra provider: machines and volumes bill separately, a stopped machine still pays for its volume, and nothing reclaims dead tenants unless you build it.
At 10K+ signups/day that became tens of thousands of running VMs and volumes, most for accounts that ran once and left.
Fixes: gate provisioning behind a card, treat reclamation as a real subsystem, and cap per-account spend.

Why each tenant gets a full VM

A Naïve tenant is not a normal SaaS row. It is a company: a set of agents that operate like a real business.

Each company:

runs arbitrary, model-generated shell commands and code,
drives a headless Chromium instance and installs packages at runtime,
keeps persistent on-disk state (working dir, build artifacts, caches).

Co-tenancy with that threat model is a non-starter, so every company gets its own Fly Machine:

Guest: cpu_kind: shared, 2 vCPU, 2GB RAM.
A dedicated 10GB volume for tenant state.
A strict 1:1 mapping between a company and a machine, enforced in the control plane.
A per-machine scoped key. Model traffic is proxied server-side, so the container never holds a real provider key and usage is metered per token.

The isolation was the right call. We got the lifecycle around it wrong, and that is where the money went.

High-level architecture: every signup claims or provisions its own always-on Fly Machine with a dedicated volume; the agent runtime calls Claude through a server-side metering proxy.

The cost mechanics we never modeled

Two facts about running a VM per tenant on Fly, learned the expensive way:

A machine bills for the whole time it runs, and ours ran with restart policy always. Nothing stopped an idle one.
Volumes bill independently of the machine. Stopping a tenant's machine still pays for its 10GB volume until you delete it.

So a tenant that signed up, ran one task, and vanished kept billing us for compute and storage indefinitely.

Why we couldn't just scale-to-zero

The obvious answer is auto-stop idle machines. It does not fit agent workloads cleanly.

Cold start was minutes early on: roughly 53s just for volume create, machine create, image pull and boot, before any runtime warmup. We later cut first-run to about a 12s claim from a small warm pool, but a stopped machine pays that cost again on wake.
"Idle" is ambiguous for an agent. A machine quiet for 30s is often mid-task, blocked on a model response or a long tool call. A naive idle-timeout kills live work.
Stopping frees nothing on storage. The volume bills regardless of machine state.

Auto-stop is viable, but only with agent-aware idle detection and volume GC. We shipped with neither.

Reclamation is a subsystem, not a cron afterthought

The deeper problem was drift between our control plane (Postgres) and Fly's actual state.

Failed or racy provisions left machines running with no surviving DB row: orphans that billed forever.
Tenants we "deleted" in the DB sometimes left a live machine or an unattached volume behind.

We eventually built reconciliation that diffs Fly against the DB, classifies dead tenants by staleness, and sweeps orphaned machines and volumes. It should have existed on day one. Anything that provisions external infra needs a garbage collector, not faith that creates and deletes stay balanced.

That drift, across tens of thousands of tenants, is most of the $130K.

The free-credit multiplier

Compute was half the bill. Distribution was the other half.

Every signup got a 7-day trial and $10 of model credits.
We required no credit card, so nothing filtered low-intent signups before they cost us money.
At launch volume that is $10 of subsidized usage per account, for a flood that would never convert, plus abuse running straight to the cap.

Spend was metered precisely the entire time. Visibility was never the problem. The missing piece was a hard per-account ceiling.

The trigger: 10K+ signups/day

We planned for interest, not 10,000+ signups a day, each provisioning a VM and $10 of usage.

When we hit Fly's per-region capacity, staying online meant spreading across more regions, which multiplied the always-on footprint instead of bounding it.

The numbers

Where the money went: $130K to Fly.io for always-on VMs and volumes that were never reclaimed, and $320K to Anthropic for $10 of free credits handed to every uncarded signup.

Line item	Spend	Why
Fly.io (compute)	$130,000	A dedicated, always-on machine plus 10GB volume per signup, never reclaimed, plus orphaned machines and volumes billing until cleanup
Anthropic (tokens)	$320,000	$10 of free credits per signup, no card, multiplied by launch volume
Total avoidable burn	$450,000	One month

What we changed

Gate provisioning behind a card. The highest-leverage fix by far. It filters intent before we spend a cent on a machine or a token, instead of after.
Treat reclamation as a subsystem. Reconcile Fly against the DB on a schedule, suspend idle tenants, and delete the volume when we stop a machine.
Cap per-account spend. A free account runs against a hard ceiling, so no single tenant can run unbounded.

Conclusion

If you run untrusted agents, per-tenant VM isolation is probably correct. But it makes you an infrastructure provider whether you meant to be one or not.

That means three things have to exist before you launch, not after the bill arrives: a gate on who can provision, a garbage collector for what gets provisioned, and a hard spend ceiling per tenant.

We learned all three in the same month. The bill was about $450K.

Our future

Naïve is one of the fastest-growing startups in the world right now. We're backed by Y Combinator, signups are growing faster than we can provision machines (clearly), and the platform behind autonomous companies is genuinely getting built — primitive by primitive — every week.

We're hiring talented AI engineers who want to work on this at the deepest layer: agent runtimes, multi-tenant infrastructure, model orchestration, and the systems that turn a single prompt into a real, operating company.

If running untrusted agents at scale, building secure per-tenant isolation, and getting the unit economics right (the second time) sounds like the kind of problem you want to spend the next few years on, we want to talk to you. Reach out to Dennis directly, or apply at usenaive.ai/careers.

Dennis ZaxCTO

CTO of Naïve.

@denniszax