Workflows

    How to Make an AI Testimonial Video (2026 Walkthrough)

    A complete workflow for AI-assisted testimonial videos: customer interview, scene generation, lip sync, captions, and platform cuts that actually convert.

    Versely Team10 min read

    A great testimonial video is the highest-converting asset in your marketing library. It's also the hardest to produce at scale — you need a cooperative customer, a film crew, a half-day on their schedule, and weeks of post. Most companies ship one testimonial a quarter and call it a win.

    In 2026, the AI workflow flips the math. You can capture a 20-minute customer call, turn it into a polished 60-90 second testimonial video the same week, and ship a new one every two weeks. The customer never has to be on camera if they don't want to be. Here's the exact rig.

    A warm interview setup with a couch, soft lighting, and a single camera on a tripod

    Step 1: Define the brief

    Most testimonial videos fail because they try to say everything. The brief's job is to enforce one outcome and one viewer.

    Use this template:

    CUSTOMER: [name, role, company]
    PRODUCT: [the specific product/feature they use]
    ONE OUTCOME: [the single business result you want the video to anchor on]
    ONE VIEWER: [who exactly should see this and convert]
    BEFORE: [their state before the product, in one sentence]
    AFTER: [their state today, in one sentence — with a number if possible]
    TONE: [warm-peer / authoritative-expert / scrappy-founder / data-driven-analyst]
    LENGTH: 60-90 seconds
    FORMAT: [face-on-camera / voice-only with B-roll / fully animated]
    CTA: [the action and where to take it]
    

    Worked example:

    CUSTOMER: Maya Chen, Head of Growth, Northwind Apparel.
    PRODUCT: Stockwise Slack alerts.
    ONE OUTCOME: Recovered $180K in projected lost revenue in Q1 from
      prevented stock-outs.
    ONE VIEWER: DTC ops leads at $5-25M ARR Shopify brands.
    BEFORE: Spent the first hour every morning chasing inventory fires.
    AFTER: Walks into Monday with zero inventory surprises and the team
      unblocked.
    TONE: warm-peer, lightly data-driven.
    LENGTH: 75 seconds.
    FORMAT: voice-only with cinematic B-roll, founder face on camera at end.
    CTA: Book a demo at stockwise.com/demo.
    

    The brief is locked. Every prompt and every cut downstream serves this one outcome.

    Step 2: Script + storyboard

    A testimonial video is a story, not a list of features. Use this 5-beat structure:

    1. The before-state (0-15s): What the customer's life looked like before — in their words, ideally with a quantified pain.
    2. The trigger (15-25s): What pushed them to try a solution.
    3. The first win (25-45s): The specific moment they realized it was working.
    4. The compounding result (45-65s): The bigger downstream outcome.
    5. The recommendation (65-75s): Who should use it and why, ending in your CTA.

    Run a 20-minute interview with the customer and pull quotes for each beat. Don't write the script — extract it. The customer's actual words, lightly edited, will always outperform marketing copy.

    Sample extracted script for Northwind:

    "Mondays used to start with three out-of-stock alerts and a lot of swearing. We tried building dashboards, hiring an ops analyst, none of it stuck — by the time we saw the data, the shelf was already empty. We turned on Stockwise on a Tuesday. By Friday we'd prevented our first stock-out — about $4,000 we would have eaten. Last quarter we recovered roughly $180,000 in revenue we would have lost the year before. If you're running a Shopify brand and your ops lead is firefighting more than building, just turn this on. Book a demo at stockwise.com/demo."

    Now the shot list:

    # Scene Duration Model
    1 Cinematic warehouse exterior, morning 8s VEO 3.1
    2 Office worker frustrated at laptop 7s SORA 2
    3 Phone showing a Slack alert 10s Kling 3.0
    4 Inventory shelves stocked, calm tracking shot 10s VEO 3.1
    5 Revenue chart trending up, screen capture style 10s Hailuo
    6 Maya on camera at end (real footage or generated) 15s UGC actor or real
    7 Logo + CTA card 5s Ideogram 3
    8 Closing brand shot 10s VEO 3.1

    A storyboard panel with sketched scene frames laid out on a desk

    Step 3: Generate scenes

    Open the AI video generator and work through the shot list. The right model per shot matters more than any single hero prompt.

    Cinematic establishing shots: VEO 3.1. It nails depth, light direction, and color grade better than anything else this year.

    Sample prompt for the warehouse exterior:

    Slow cinematic dolly toward a modern warehouse exterior at golden hour,
    soft warm light raking from the left, a single delivery van pulling up,
    shallow depth of field, anamorphic lens look, 35mm, no text. 8 seconds.
    

    People reacting (frustration, calm, confidence): SORA 2. Wan 2.7 if you want a more stylized, slightly graphic look.

    Sample prompt for the frustrated office worker:

    Medium shot of a woman in her 30s at a laptop, expression of mild
    frustration, hand running through her hair, soft window light from
    camera-left, slightly desaturated palette, 7 seconds, no dialogue.
    

    Product / device moments: Kling 3.0 or Hailuo. Both handle screens and phone interactions cleanly.

    B-roll fillers between beats: Use the AI b-roll generator. Feed it your script line by line, get back a matched clip per beat. Saves about 30 minutes per testimonial.

    Stills you'll animate: Flux 1.2 Ultra or Midjourney v7. Use a slow Ken Burns push in your editor.

    Customer face (if they don't want to be on camera): Two paths. Use the UGC video generator with a UGC actor template that loosely matches their demographic. Or generate a stylized, anonymized portrait and animate it — but always disclose this. Misrepresenting a customer's face is both ethically wrong and a legal exposure under the EU AI Act and California AB 2655.

    Logo / CTA card: Ideogram 3. It's the only model that consistently nails crisp text in a single generation.

    Run each prompt twice. Pick the better take. Move on. The fastest way to blow a testimonial budget is to chase a single perfect generation for an hour.

    Step 4: Voiceover + lip sync

    Three options, in order of impact:

    Option A — Use the customer's actual voice from the interview. Clean it up with a denoiser and use it as your VO. This is always the strongest choice if the audio quality is acceptable.

    Option B — Clone the customer's voice (with permission). Record 60 seconds of clean audio during the interview, run it through AI voice cloning with ElevenLabs v3, and generate the polished script in their voice. This lets you tighten the script without re-recording. Requires explicit written consent.

    Option C — Use a voice actor read. Either generate via ElevenLabs v3 / Inworld TTS-2 or hire a real VO actor. Always disclose if it's not the customer's actual voice.

    If you have a face on camera in the closing shot, use AI lipsync to align the mouth to the cleaned VO. Drop in the video, drop in the audio, hit generate. Two minutes per shot.

    VO direction (if you're generating):

    Pace: conversational, like a peer talking to another peer.
    Energy: warm, slightly understated. Not selling.
    Emphasis: punch the dollar amounts and the trigger moment.
    Pauses: natural breath after each beat. No theatrical pauses.
    Smile: warmth on the recommendation line.
    

    A close-up of audio waveforms on a screen with editing controls below

    Step 5: Music, captions, thumbnail

    Music. Generate a 75-second underscore in Suno v5.5 or Lyria. Prompt: "Warm acoustic underscore, soft piano and gentle string pad, no drums, building gently to a single resolved chord at 0:65, no vocals, ends on a soft sustained note." Mix at -20 LUFS — testimonials need the voice to feel intimate, not produced.

    Captions. Burn them in. Inter or SF Pro at 60-72px, white with a 2px black stroke, max 4 words per line. For testimonials specifically, use a slightly slower fade-in per word (180ms instead of the standard 80ms) — the pacing reads as more thoughtful and trustworthy.

    Thumbnail. Generate three options in the AI thumbnail generator. For a testimonial, the highest-CTR formula is: customer face (real or with consent for an avatar) + a single quoted phrase in 3-5 words + their name and company underneath. A/B test for 24 hours and keep the winner.

    Step 6: Final cut + publish

    Stitch in your editor. We use the AI movie maker for fully agentic stitching, but Descript or CapCut work fine. Cut on the breath, not on the beat — testimonials feel more honest when the cuts respect natural speech pauses rather than the music grid.

    Export four cuts:

    • 16:9, 75s for YouTube, your testimonials page, and case-study landing pages.
    • 9:16, 30-45s for Reels, TikTok, and Shorts. Keep the before-state, the first win, and the recommendation.
    • 1:1, 60s for LinkedIn — the platform where testimonials convert hardest.
    • 9:16, 15s for paid social. Hook quote + CTA only.

    For the vertical cuts, re-frame each scene by 30%. Don't letterbox — recompose so faces sit in the upper third.

    Time check: brief (10 min) + customer interview (30 min) + script extraction (20 min) + generation (90 min) + voice cleanup + lipsync (30 min) + music + captions (20 min) + cuts + exports (40 min) = about 4 hours of focused work spread over two days. Total spend: $10-15.

    A finished video preview window showing a testimonial card with a customer quote

    FAQ

    How long should an AI testimonial video be?

    60-90 seconds for the master cut on YouTube and your site. 30-45 seconds for short-form. 15 seconds for paid. The 90-second version is your hardest-working asset — embed it on case-study pages and pricing pages.

    Do I need the customer on camera?

    No. The strongest signal in a testimonial is the customer's actual voice and a specific quantified outcome — not their face. A voice-over-B-roll testimonial often outperforms talking-head when the B-roll is on-brand and the quote is sharp.

    Can I generate a customer's face if they don't want to be on camera?

    Only with explicit written consent and clear disclosure to viewers. Generating likenesses without consent is both unethical and legally risky under the EU AI Act and California AB 2655. The safer path: use a UGC actor template with a generic anonymized framing.

    Which AI model is best for testimonial B-roll?

    VEO 3.1 for cinematic establishing shots. SORA 2 for people. Kling 3.0 for product and device moments. Use the AI b-roll generator to feed your script line-by-line and get matched clips per beat.

    How often should I publish testimonial videos?

    One every two weeks if you're a B2B company over $1M ARR. The cadence matters more than any single video — testimonials compound socially. A library of 20 short testimonials always outperforms one big-budget hero film.


    Ready to ship your next testimonial? Start with the AI movie maker for a full agentic build, or use the story-to-video tool to turn a customer interview transcript into a scene-by-scene draft in 5 minutes. For more on the broader content workflow, read the 2026 content playbook.

    #testimonial-video#ai-video#social-proof#ugc#lipsync#voice-cloning#case-study