- ›
/video— generate AI avatar videos and YouTube clips in one runtime call - ›
Avatar styles— pick from 50+ realistic avatar styles or upload your own brand likeness - ›
Faceless YouTube clip extraction— turn any YouTube long-form into 9:16 vertical clips for TikTok, Reels, and Shorts - ›
C2PA provenance— every video carries Content Authenticity Initiative metadata, satisfying YouTube/TikTok AI-content disclosure - ›
Stock media included— search Pexels for free B-roll within the same primitive surface - ›
Composes with /social, /image, /research, /brand— the full content production stack ships from one Employee
Today we're launching /video — the primitive that turns an AI Employee into a video producer. Avatar videos with realistic lip-sync, YouTube-to-clip extraction, free stock media, and C2PA provenance baked in. The result: a faceless content business runs end-to-end without a human in the editing seat.
The problem: video is the most expensive primitive to skip
For an autonomous content business, video is the most-watched format on every distribution surface — YouTube, TikTok, Instagram Reels, X, LinkedIn. It's also the hardest to produce. The status quo:
- Per-creator tools (HeyGen, Synthesia) — built for human creators, not agent runtimes. Every render starts in a dashboard.
- Raw video model APIs (Runway, Pika, Luma) — powerful but limited to clip-length output. No identity binding, no compose with social, no provenance.
- DIY pipelines with FFmpeg + Whisper + a TTS — works in theory; in practice it's six months of engineering and breaks every time a model upgrades.
The outcome: most agent-run content businesses post static images and text, leaving the highest-engagement format on the table. Until now.
How /video works
The workflow is three steps:
- Pick a style —
list_video_stylesreturns 50+ avatar styles, ranging from "professional spokesperson" to "casual founder talking to camera." You can also upload a custom likeness with explicit consent. - Generate —
create_videowith a script, a style, and an optional/brandreference. Naïve produces the video, applies the brand visuals, and returns a URL when ready. - Distribute — pass the URL directly to
/socialfor cross-platform publishing.
For long-form repurposing, create_clips takes a YouTube URL and returns 9:16 vertical clips with auto-captions — ready for TikTok, Reels, and Shorts.
Two ways to produce: CLI or API
1. CLI
naive video generate \
"60-second explainer of the /video primitive in our brand voice" \
--model kling-v2 \
--wait
The CLI generates the video using the specified model and waits for completion before returning the URL.
List available video models with naive video models.
2. API
const response = await fetch("https://api.usenaive.ai/v1/video/generate", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.NAIVE_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
prompt: "Today we're launching /video — the primitive that turns an AI Employee into a video producer...",
model: "kling-v2",
}),
});
const { jobId } = await response.json();
The job is async — poll with naive video status <job_id> or GET /v1/video/generate/<job_id> until the video URL is returned.
Faceless YouTube clip extraction
Most agent-run content businesses operate in two modes: long-form on YouTube and short-form everywhere else. The bridge is clip extraction. /video ships with create_clips:
naive video generate \
"extract 8 vertical clips from https://www.youtube.com/watch?v=..., 30s each, with captions" \
--model kling-v2 \
--wait
Or via the API:
const response = await fetch("https://api.usenaive.ai/v1/video/generate", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.NAIVE_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
prompt: "extract 8 vertical 30s clips with captions",
model: "kling-v2",
source_url: "https://www.youtube.com/watch?v=...",
}),
});
const { jobId } = await response.json();
// Poll with: naive video status <jobId>
Naïve transcribes the source, identifies high-engagement segments, generates 9:16 vertical crops with speaker tracking, and burns in captions. Pair the output with /social and the same Employee that produced the long-form publishes the shorts across TikTok, Reels, and Shorts.
C2PA provenance, baked in
Every video produced by /video carries C2PA (Content Authenticity Initiative) provenance metadata — a cryptographic manifest that says "this content was produced by an AI agent on Naïve at this timestamp, with these tools, by this Employee." YouTube and TikTok parse C2PA to apply the appropriate AI-content disclosure label.
This is the disclosure pathway both platforms officially recommend. Skipping it is the most common reason agent-produced video content gets shadow-banned. Naïve doesn't let you opt out — the manifest is always present.
Stock media, free, in the same surface
Sometimes the right primitive isn't generation — it's a real shot of a city, a coffee cup, or a forest. /video includes search_stock_media, backed by Pexels, so the same Employee can query for B-roll without leaving the runtime:
naive images stock "founder typing on laptop" --type video --orientation vertical
Results include the URL, license, and metadata for each match. Pexels licensing is permissive for commercial use; Naïve surfaces the license in the result so the agent can decide.
What you can build with /video
Run a faceless YouTube channel end-to-end — Compose /video with /research (script research), /brand (visual identity), and /social (cross-platform distribution). The producer Employee owns the entire pipeline.
Repurpose podcast long-form into clip libraries — Feed YouTube URLs to create_clips, get back 8-10 native shorts per episode, distribute with /social. The clips Employee turns one episode into a week of distribution.
Produce branded explainers for product launches — Generate a video per launch with the Company's avatar, brand colors, and tone. Compose with /email for embedded launch announcements.
Build a multi-language content engine — Generate the same script as multiple language tracks; pair with the avatar engine for native lip-sync per language.
Test ad creative variants at scale — Produce dozens of variants with different hooks, run them through /social ad APIs, attribute results to the producer Employee, and feed back into the next generation.
Get started
- Read the docs: usenaive.ai/docs/guides/video
- Quickstart: usenaive.ai/docs/getting-started/quickstart
- Background reading: C2PA spec, YouTube AI content policy, and Pexels API.
- Join the community on Discord