Introducing /video: Generate full videos from your agent in one call

TL;DR

›/video — generate AI avatar videos and YouTube clips in one runtime call
›Avatar styles — pick from 50+ realistic avatar styles or upload your own brand likeness
›Faceless YouTube clip extraction — turn any YouTube long-form into 9:16 vertical clips for TikTok, Reels, and Shorts
›C2PA provenance — every video carries Content Authenticity Initiative metadata, satisfying YouTube/TikTok AI-content disclosure
›Stock media included — search Pexels for free B-roll within the same primitive surface
›Composes with /social, /image, /research, /brand — the full content production stack ships from one Employee

Today we're launching /video — the primitive that turns an AI Employee into a video producer. Avatar videos with realistic lip-sync, YouTube-to-clip extraction, free stock media, and C2PA provenance baked in. The result: a faceless content business runs end-to-end without a human in the editing seat.

The problem: video is the most expensive primitive to skip

For an autonomous content business, video is the most-watched format on every distribution surface — YouTube, TikTok, Instagram Reels, X, LinkedIn. It's also the hardest to produce. The status quo:

Per-creator tools (HeyGen, Synthesia) — built for human creators, not agent runtimes. Every render starts in a dashboard.
Raw video model APIs (Runway, Pika, Luma) — powerful but limited to clip-length output. No identity binding, no compose with social, no provenance.
DIY pipelines with FFmpeg + Whisper + a TTS — works in theory; in practice it's six months of engineering and breaks every time a model upgrades.

The outcome: most agent-run content businesses post static images and text, leaving the highest-engagement format on the table. Until now.

How /video works

The workflow is three steps:

Pick a style — list_video_styles returns 50+ avatar styles, ranging from "professional spokesperson" to "casual founder talking to camera." You can also upload a custom likeness with explicit consent.
Generate — create_video with a script, a style, and an optional /brand reference. Naïve produces the video, applies the brand visuals, and returns a URL when ready.
Distribute — pass the URL directly to /social for cross-platform publishing.

For long-form repurposing, create_clips takes a YouTube URL and returns 9:16 vertical clips with auto-captions — ready for TikTok, Reels, and Shorts.

Two ways to produce: CLI or API

1. CLI

naive video generate \
  "60-second explainer of the /video primitive in our brand voice" \
  --model kling-v2 \
  --wait

The CLI generates the video using the specified model and waits for completion before returning the URL.

List available video models with naive video models.

2. API

const response = await fetch("https://api.usenaive.ai/v1/video/generate", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.NAIVE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    prompt: "Today we're launching /video — the primitive that turns an AI Employee into a video producer...",
    model: "kling-v2",
  }),
});

const { jobId } = await response.json();

The job is async — poll with naive video status <job_id> or GET /v1/video/generate/<job_id> until the video URL is returned.

Faceless YouTube clip extraction

Most agent-run content businesses operate in two modes: long-form on YouTube and short-form everywhere else. The bridge is clip extraction. /video ships with create_clips:

naive video generate \
  "extract 8 vertical clips from https://www.youtube.com/watch?v=..., 30s each, with captions" \
  --model kling-v2 \
  --wait

Or via the API:

const response = await fetch("https://api.usenaive.ai/v1/video/generate", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.NAIVE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    prompt: "extract 8 vertical 30s clips with captions",
    model: "kling-v2",
    source_url: "https://www.youtube.com/watch?v=...",
  }),
});

const { jobId } = await response.json();
// Poll with: naive video status <jobId>

Naïve transcribes the source, identifies high-engagement segments, generates 9:16 vertical crops with speaker tracking, and burns in captions. Pair the output with /social and the same Employee that produced the long-form publishes the shorts across TikTok, Reels, and Shorts.

C2PA provenance, baked in

Every video produced by /video carries C2PA (Content Authenticity Initiative) provenance metadata — a cryptographic manifest that says "this content was produced by an AI agent on Naïve at this timestamp, with these tools, by this Employee." YouTube and TikTok parse C2PA to apply the appropriate AI-content disclosure label.

This is the disclosure pathway both platforms officially recommend. Skipping it is the most common reason agent-produced video content gets shadow-banned. Naïve doesn't let you opt out — the manifest is always present.

Stock media, free, in the same surface

Sometimes the right primitive isn't generation — it's a real shot of a city, a coffee cup, or a forest. /video includes search_stock_media, backed by Pexels, so the same Employee can query for B-roll without leaving the runtime:

naive images stock "founder typing on laptop" --type video --orientation vertical

Results include the URL, license, and metadata for each match. Pexels licensing is permissive for commercial use; Naïve surfaces the license in the result so the agent can decide.

What you can build with /video

Run a faceless YouTube channel end-to-end — Compose /video with /research (script research), /brand (visual identity), and /social (cross-platform distribution). The producer Employee owns the entire pipeline.

Repurpose podcast long-form into clip libraries — Feed YouTube URLs to create_clips, get back 8-10 native shorts per episode, distribute with /social. The clips Employee turns one episode into a week of distribution.

Produce branded explainers for product launches — Generate a video per launch with the Company's avatar, brand colors, and tone. Compose with /email for embedded launch announcements.

Build a multi-language content engine — Generate the same script as multiple language tracks; pair with the avatar engine for native lip-sync per language.

Test ad creative variants at scale — Produce dozens of variants with different hooks, run them through /social ad APIs, attribute results to the producer Employee, and feed back into the next generation.

Get started

Read the docs: usenaive.ai/docs/guides/video
Quickstart: usenaive.ai/docs/getting-started/quickstart
Background reading: C2PA spec, YouTube AI content policy, and Pexels API.
Join the community on Discord

Frequently Asked Questions

What is /video?+

/video is the Naïve primitive that produces video content from an AI Employee. It includes AI avatar generation with realistic lip-sync, YouTube long-form to short-form clip extraction, free stock media search via Pexels, and structured editing tools — all surfaced as runtime objects the agent can pass directly to /social for publishing.

How does /video work?+

Call create_video with a script, an avatar style, and an optional brand kit and Naïve produces a finished video. For clip extraction, call create_clips with a YouTube URL and Naïve returns 9:16 vertical clips ready for TikTok, Reels, and YouTube Shorts. Both are async — get_video_session and get_clip_status track progress.

Will YouTube or TikTok flag the content as AI-generated?+

Every video produced by /video carries C2PA (Content Authenticity Initiative) provenance metadata, which YouTube and TikTok parse to apply the appropriate AI-content label. This is the disclosure pathway both platforms officially recommend; opting out is not supported because doing so violates platform policy and risks account bans.

How realistic are the avatars?+

The avatar engine produces lip-sync indistinguishable from human performance for ~95% of viewers in blind tests. The remaining 5% can identify subtle artifacts at full-frame view; for vertical/short-form formats, indistinguishability climbs to ~98%. Realism continues to improve with each model release.

How much does /video cost?+

Avatar video generation is billed per-minute of output. Clip extraction is billed per-clip. Stock media search is included in the base subscription. See the pricing page for current rates.

What's the difference between /video and HeyGen, Synthesia, or Runway?+

HeyGen, Synthesia, and Runway are creator tools — designed for humans in a dashboard. /video is an agent primitive: video creation is a runtime call from an Employee, output composes directly with /social for distribution and /brand for visual consistency, and the entire workflow runs without a person in the loop. Naïve uses tier-1 video model providers under the hood.

How do I get started with /video?+

Run naive video generate 'a 60-second explainer of the /video primitive' --model kling-v2 --wait and Naïve returns a URL when ready. The full quickstart is at usenaive.ai/docs/getting-started/quickstart.

MyxEngineering

Engineering at Naïve. Owns the comms primitives.

@myx_naive