How-to

    Versely UGC Video Generator: A Complete Walkthrough of the 5 Tools That Ship Ads Faster

    Walk through all five Versely UGC tools - overlays, captions, background removal, compose overlay, and timestamped captions - with pricing and a real pipeline.

    Versely Team8 min read

    UGC ads live or die on two things: how fast you can produce variants, and how close they look to organic. Versely's UGC Video Generator attacks both. It is not a single monolithic "make me an ad" button. It is five focused, composable operations that each solve one real production problem, and together replace what used to be a CapCut project, a Premiere session, and a round of manual subtitling.

    This walkthrough covers all five UGC ops in detail, the exact credit costs, the caption and overlay configuration options, and a real end-to-end creator workflow that ships an ad from script to MP4 in one pass.

    Phone shooting a creator-style video

    The five UGC operations

    Each operation is a standalone tool with its own credit cost. You pick the one that matches the transformation you need.

    Operation Credits What it does
    VIDEO_OVERLAY 10 Places a smaller video on top of a base video with position control
    ADD_CAPTIONS 5 Burns a single static caption block into the video
    REMOVE_BLACK_BG 10 Strips a black background from a foreground clip for compositing
    COMPOSE_OVERLAY 15 Stitches a sequence and applies an overlay in one pass
    TIMESTAMPED_CAPTIONS 8 Generates captions aligned to the audio timeline

    If you are familiar with the broader UGC playbook, pair this guide with our complete guide to AI UGC ads for ecommerce.

    Caption sizing and positioning

    Captions are the single most impactful ad-level lever. Versely gives you three sizes and three positions to work with.

    The three caption sizes are:

    • Small at 24 px - for longer lines, dense information, or square formats
    • Medium at 28 px - the default for most 9:16 short-form content
    • Large at 32 px - for hook lines, single-word beats, high-contrast thumbnails

    The three caption positions are top, middle, and bottom. Top tends to perform best when the creator's face is in the lower third of the frame. Middle works for products where you want eyes drawn to the on-screen text first. Bottom is the platform-safe default.

    Pick size and position per variant rather than per ad. A/B testing caption position alone can swing CTR by meaningful percentages on direct response creative.

    Overlay positioning

    Overlays are smaller videos composited on top of the base clip. You can place them in five positions:

    • top-left
    • top-right
    • bottom-left
    • bottom-right
    • center

    The center position is for reveal moments - a product shot dropped into the center of a talking head as the voiceover lands on the product name. The corner positions are for persistent context, like a gameplay clip riding along in the bottom-right while the creator talks.

    VIDEO_OVERLAY in detail

    Cost: 10 credits.

    VIDEO_OVERLAY takes a base video and an overlay video, plus a position and a size, and returns a composited MP4. It is the right tool when your base and overlay are already edited individually. If the base still needs to be built by stitching multiple clips, use COMPOSE_OVERLAY instead and save a step.

    Use cases: talking head plus gameplay, product demo plus reaction webcam, AI avatar plus B-roll corner.

    ADD_CAPTIONS in detail

    Cost: 5 credits.

    ADD_CAPTIONS burns a single static caption block into the video. It is perfect for a fixed hook line that stays on screen for the whole clip, or for a branded tagline at the bottom of every variant. Pick your size, position, and text, and the tool returns the final video with the caption rendered directly into the frames.

    Static captions are the cheapest tool in the suite because the rendering is trivial. Use it liberally for hook variants.

    REMOVE_BLACK_BG in detail

    Cost: 10 credits.

    REMOVE_BLACK_BG strips a black background from a foreground clip and returns a version with transparency where the black used to be. This is the tool you use when an AI avatar or a stock animation ships with a solid black background and you need to composite it onto another video.

    Chain it with VIDEO_OVERLAY and you have a full compositing pipeline: remove the black background, then overlay the cleaned foreground on top of your base.

    Editor working with layered clips

    COMPOSE_OVERLAY in detail

    Cost: 15 credits.

    COMPOSE_OVERLAY is the power move. It stitches a sequence of base clips and applies the overlay in a single pass, instead of forcing you to stitch first with a separate tool and overlay second. It is the right choice for multi-scene ads where the overlay runs continuously across scene cuts, because it avoids re-encoding the base twice.

    The credit cost is 15 - slightly more than doing it in two steps - but the quality win is meaningful because you avoid a generation of compression loss from the intermediate stitch.

    TIMESTAMPED_CAPTIONS in detail

    Cost: 8 credits.

    Static captions are fine when the text is a fixed hook. When the text has to track the voiceover, you want TIMESTAMPED_CAPTIONS. This tool generates per-word or per-phrase captions aligned to the audio and burns them into the video. This is the standard for voiced UGC where every line needs on-screen reinforcement.

    Most short-form ads in 2026 run timestamped captions by default. If your voiceover is dense, use medium size at top position so the text does not collide with the creator's lip movement.

    A real creator workflow, end to end

    Here is a full workflow that produces a voiced UGC ad from a blank page to a publish-ready file.

    Step 1. Script the hook and body

    Write a hook line in under nine words and a body script in under forty. Keep it punchy; UGC does not reward subtlety.

    Step 2. Generate an AI avatar

    Use a Kling Avatar V2 or HeyGen Avatar V4 generation to produce the talking-head base clip from your script. This is your primary footage.

    Step 3. Stitch B-roll or gameplay

    Gather the background clip that will ride alongside the avatar. For gameplay, a corner overlay works. For product B-roll, a center reveal hits harder.

    Step 4. Compose the overlay in one pass

    Use COMPOSE_OVERLAY to stitch the avatar segments and drop the B-roll overlay in at the right position. Cost: 15 credits. Output: a single composited MP4.

    Step 5. Burn timestamped captions

    Run TIMESTAMPED_CAPTIONS with medium size at top position. Cost: 8 credits. Your captions now track the voiceover beat for beat.

    Step 6. Ship variants

    Run the same base through different caption positions and sizes. Each static variant via ADD_CAPTIONS costs 5 credits. In under an hour you have ten hook variants ready for paid testing.

    Total credits for a full production pass: 28 for the core ad, plus 5 per hook variant. If you want to dig deeper into the economics of running UGC programs at scale, our piece on how AI UGC creators make money in 2026 covers it directly.

    Where each tool fits in the funnel

    Cold top-of-funnel ads rely heavily on hook variation, which means ADD_CAPTIONS gets the most reps. Consideration-stage ads lean on COMPOSE_OVERLAY for richer B-roll context. Retargeting ads tend to be the simplest - single avatar plus TIMESTAMPED_CAPTIONS - because the viewer already knows the brand and the work is done by the offer.

    You can jump directly into the tool at the UGC video generator page.

    Frequently asked questions

    Can I use a real creator clip as the base instead of an AI avatar? Yes. All five operations work on any MP4. AI avatars are optional; real footage is fully supported.

    What aspect ratios are supported? The UGC pipeline is optimized for 9:16 but 1:1 and 16:9 are both supported. Caption sizing scales sensibly across ratios.

    Can I preview captions before committing credits? Yes. You can preview the caption placement at generation time and adjust size and position before the final render.

    Is there a batch mode for running many variants at once? Yes. You can queue a base video with multiple caption configurations and get all variants back in parallel.

    How do I handle white backgrounds instead of black? REMOVE_BLACK_BG targets black specifically. For white or arbitrary backgrounds, run the clip through the image editor's mask tooling first.

    Closing takeaway

    UGC at scale is a variant production problem, not an individual video problem. Versely's five UGC operations are designed around that reality. Ship the base once with COMPOSE_OVERLAY, track the voiceover with TIMESTAMPED_CAPTIONS, and run hook variants through ADD_CAPTIONS until the data tells you which one wins. The creators who treat these tools as a pipeline - not as one-off effects - are the ones who put twenty winning hooks into rotation every week.

    #UGC video generator#UGC ads#AI captions#video overlay#UGC workflow 2026#short form ads#direct response video