Introducing /llm: 300+ models behind one key, billed in credits

TL;DR

›/llm — OpenAI-compatible chat completions across 300+ models through a single Naïve key
›Provider routing & fallbacks — reach Anthropic, OpenAI, Google, Meta, and more with automatic failover, no per-provider accounts
›Streaming built in — SSE chunks as they arrive, with the final chunk carrying usage and exact cost
›Billed in credits — every call is charged at the precise upstream cost, on the same balance as every other primitive
›Drop-in proxy — point the OpenAI SDK at /v1/proxy/openrouter and your existing code works unchanged
›Composes with everything — the same key powers research, orchestration, and the agents that call all of it

Today we're launching /llm — chat completions across 300+ models behind a single Naïve key. Provider routing and automatic fallbacks, streaming over SSE, exact-cost billing in credits, and a drop-in proxy that makes your existing OpenAI-compatible code work by changing one URL. The same key that sends email, issues cards, and deploys apps now reaches every model worth calling.

The problem: model access is a key-management tax

Every agent needs inference. But getting it usually means collecting accounts and keys like trading cards:

A key per provider. OpenAI, Anthropic, Google, Mistral — each with its own dashboard, billing relationship, and rotation schedule. Multiply that across a fleet of agents and it's a security and ops problem before it's a product.
No unified accounting. Spend is scattered across provider invoices. Answering "what did this agent cost to run" means reconciling five bills.
Routing and fallbacks are DIY. When a provider is down or rate-limited, you write the retry-and-failover logic yourself, in every agent.

For a platform where an agent already has email, payments, and a workforce through one key, making inference the one capability that needs five separate accounts is backwards. Until now.

How /llm works

/llm is a full wrapper over an OpenAI-compatible chat completions API with provider routing built in. You pick a model by slug; Naïve routes the call, applies fallbacks, and bills the exact upstream cost against your credit balance. There are no provider keys in your code — only your nv_sk_* key.

const res = await naive.llm.chat({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "Summarize this support thread in 3 bullets." }],
});
 
console.log(res.choices[0].message.content);
console.log(res.usage); // tokens + exact cost in credits

Discover models from the live catalog:

const { models, count } = await naive.llm.models("claude");

Streaming, with cost on the final chunk

For anything user-facing, stream. naive.llm.stream(...) yields OpenAI-compatible chunks as they arrive, and the final chunk carries the usage object — including the precise cost of the call.

const messages = [{ role: "user", content: "Write a launch tweet for our new API." }];
 
for await (const chunk of naive.llm.stream({ model: "openai/gpt-5.2", messages })) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Because every call reports its own cost, you can attribute spend per request — per agent, per Employee, per end-user — instead of reconciling a provider invoice at the end of the month.

The drop-in OpenRouter proxy

Already have code written against the OpenAI SDK? Change the base URL and your Naïve key becomes your model key. Nothing else moves.

import OpenAI from "openai";
 
const openai = new OpenAI({
  baseURL: "https://api.usenaive.ai/v1/proxy/openrouter",
  apiKey: process.env.NAIVE_API_KEY,
});
 
const res = await openai.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "Hello" }],
});

The proxy speaks the same wire format, so existing OpenAI/OpenRouter clients, frameworks, and tools work unchanged — now on unified Naïve billing and governance.

Call it from the CLI or raw REST

curl -X POST https://api.usenaive.ai/v1/llm/chat/completions \
  -H "Authorization: Bearer $NAIVE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.6",
    "messages": [{ "role": "user", "content": "Give me three product names." }]
  }'

The endpoint is OpenAI-compatible, so any client that can hit /chat/completions can hit /llm.

What you can build with /llm

Power your agents on the same key as everything else — The Employees that send email, run research, and deploy apps already authenticate with a Naïve key. /llm means their inference runs through it too — one balance, one audit trail, one thing to rotate.

Route by task, fail over automatically — Use a fast cheap model for triage and a frontier model for the hard step, all by changing one slug. Provider routing handles outages and rate limits so you don't write failover logic in every loop.

Attribute model spend per customer — In a multi-tenant build, scope /llm to a tenant user and read exact cost per call to meter and bill inference downstream alongside /billing.

Migrate existing code in one line — Point your current OpenAI-based app at the proxy and inherit unified billing, logging, and 300+ models without a rewrite.

Compose with research and generation — Pair /llm with /research for grounded answers and /image for assets — text, web, and media generation all behind one key.

Get started

Drop this starter prompt into any coding agent to wire up Naïve:

Read https://usenaive.ai/skill.md and use it to set up Naïve in my project.

Read the docs: usenaive.ai/docs/getting-started/llm
Proxy reference: usenaive.ai/docs/api-reference/overview
Quickstart: usenaive.ai/docs/getting-started/quickstart
Join the community on Discord

Frequently Asked Questions

What is /llm?+

/llm is Naïve's LLM routing primitive — a full wrapper over an OpenAI-compatible chat completions API that reaches 300+ models with provider routing and fallbacks. One method gives you Anthropic, OpenAI, Google, Meta, Mistral, and more, with streaming, model discovery, and per-call cost reporting, all behind your single Naïve API key.

Which models can I use?+

300+, spanning the major providers and open-weight models. Call naive.llm.models("claude") to search the live catalog. You select a model by its slug (e.g. anthropic/claude-sonnet-4.6 or openai/gpt-5.2) in the model field of a request.

How is it billed?+

In Naïve credits, at the exact cost the upstream provider returns for the call — no markup baked into a flat per-token rate. Usage and cost come back on the final streaming chunk (and via naive.llm.generation(id)), so you can attribute spend per call.

Does it support streaming?+

Yes. naive.llm.stream(...) yields OpenAI-compatible chunks over SSE as they arrive; the final chunk carries the usage object including cost. Use naive.llm.chat(...) for a single non-streaming response.

Why not just call OpenRouter or each provider directly?+

Because /llm is the same key as every other Naïve primitive. You get unified credit billing, governance and logging, multi-tenant scoping per end-user, and no fleet of provider accounts and keys to manage. For agents that already use email, cards, and apps through Naïve, model calls being one more namespace is the point.

How do I get started with /llm?+

Call naive.llm.chat({ model, messages }) with the SDK, or point any OpenAI-compatible client at https://api.usenaive.ai/v1/proxy/openrouter using your Naïve key. The full guide is at usenaive.ai/docs/getting-started/llm.

Dennis ZaxCTO

CTO of Naïve. Building the open-source agent runtime.

@denniszax