Versely UGC Video Generator: A Complete Walkthrough of the 5 Tools That Ship Ads Faster

UGC ads live or die on two things: how fast you can produce variants, and how close they look to organic. Versely's UGC Video Generator attacks both. It is not a single monolithic "make me an ad" button. It is five focused, composable operations that each solve one real production problem, and together replace what used to be a CapCut project, a Premiere session, and a round of manual subtitling.

This walkthrough covers all five UGC ops in detail, the exact credit costs, the caption and overlay configuration options, and a real end-to-end creator workflow that ships an ad from script to MP4 in one pass.

Phone shooting a creator-style video

The five UGC operations

Each operation is a standalone tool with its own credit cost. You pick the one that matches the transformation you need.

Operation	Credits	What it does
VIDEO_OVERLAY	10	Places a smaller video on top of a base video with position control
ADD_CAPTIONS	5	Burns a single static caption block into the video
REMOVE_BLACK_BG	10	Strips a black background from a foreground clip for compositing
COMPOSE_OVERLAY	15	Stitches a sequence and applies an overlay in one pass
TIMESTAMPED_CAPTIONS	8	Generates captions aligned to the audio timeline

If you are familiar with the broader UGC playbook, pair this guide with our complete guide to AI UGC ads for ecommerce.

Caption sizing and positioning

Captions are the single most impactful ad-level lever. Versely gives you three sizes and three positions to work with.

The three caption sizes are:

Small at 24 px - for longer lines, dense information, or square formats
Medium at 28 px - the default for most 9:16 short-form content
Large at 32 px - for hook lines, single-word beats, high-contrast thumbnails

The three caption positions are top, middle, and bottom. Top tends to perform best when the creator's face is in the lower third of the frame. Middle works for products where you want eyes drawn to the on-screen text first. Bottom is the platform-safe default.

Pick size and position per variant rather than per ad. A/B testing caption position alone can swing CTR by meaningful percentages on direct response creative.

Overlay positioning

Overlays are smaller videos composited on top of the base clip. You can place them in five positions:

top-left
top-right
bottom-left
bottom-right
center

The center position is for reveal moments - a product shot dropped into the center of a talking head as the voiceover lands on the product name. The corner positions are for persistent context, like a gameplay clip riding along in the bottom-right while the creator talks.

VIDEO_OVERLAY in detail

Cost: 10 credits.

VIDEO_OVERLAY takes a base video and an overlay video, plus a position and a size, and returns a composited MP4. It is the right tool when your base and overlay are already edited individually. If the base still needs to be built by stitching multiple clips, use COMPOSE_OVERLAY instead and save a step.

Use cases: talking head plus gameplay, product demo plus reaction webcam, AI avatar plus B-roll corner.

ADD_CAPTIONS in detail

Cost: 5 credits.

ADD_CAPTIONS burns a single static caption block into the video. It is perfect for a fixed hook line that stays on screen for the whole clip, or for a branded tagline at the bottom of every variant. Pick your size, position, and text, and the tool returns the final video with the caption rendered directly into the frames.

Static captions are the cheapest tool in the suite because the rendering is trivial. Use it liberally for hook variants.

REMOVE_BLACK_BG in detail

Cost: 10 credits.

REMOVE_BLACK_BG strips a black background from a foreground clip and returns a version with transparency where the black used to be. This is the tool you use when an AI avatar or a stock animation ships with a solid black background and you need to composite it onto another video.

Chain it with VIDEO_OVERLAY and you have a full compositing pipeline: remove the black background, then overlay the cleaned foreground on top of your base.

Editor working with layered clips

COMPOSE_OVERLAY in detail

Cost: 15 credits.

COMPOSE_OVERLAY is the power move. It stitches a sequence of base clips and applies the overlay in a single pass, instead of forcing you to stitch first with a separate tool and overlay second. It is the right choice for multi-scene ads where the overlay runs continuously across scene cuts, because it avoids re-encoding the base twice.

The credit cost is 15 - slightly more than doing it in two steps - but the quality win is meaningful because you avoid a generation of compression loss from the intermediate stitch.

TIMESTAMPED_CAPTIONS in detail

Cost: 8 credits.

Static captions are fine when the text is a fixed hook. When the text has to track the voiceover, you want TIMESTAMPED_CAPTIONS. This tool generates per-word or per-phrase captions aligned to the audio and burns them into the video. This is the standard for voiced UGC where every line needs on-screen reinforcement.

Most short-form ads in 2026 run timestamped captions by default. If your voiceover is dense, use medium size at top position so the text does not collide with the creator's lip movement.

A real creator workflow, end to end

Here is a full workflow that produces a voiced UGC ad from a blank page to a publish-ready file.

Step 1. Script the hook and body

Write a hook line in under nine words and a body script in under forty. Keep it punchy; UGC does not reward subtlety.

Step 2. Generate an AI avatar

Use a Kling Avatar V2 or HeyGen Avatar V4 generation to produce the talking-head base clip from your script. This is your primary footage.

Step 3. Stitch B-roll or gameplay

Gather the background clip that will ride alongside the avatar. For gameplay, a corner overlay works. For product B-roll, a center reveal hits harder.

Step 4. Compose the overlay in one pass

Use COMPOSE_OVERLAY to stitch the avatar segments and drop the B-roll overlay in at the right position. Cost: 15 credits. Output: a single composited MP4.

Step 5. Burn timestamped captions

Run TIMESTAMPED_CAPTIONS with medium size at top position. Cost: 8 credits. Your captions now track the voiceover beat for beat.

Step 6. Ship variants

Run the same base through different caption positions and sizes. Each static variant via ADD_CAPTIONS costs 5 credits. In under an hour you have ten hook variants ready for paid testing.

Total credits for a full production pass: 28 for the core ad, plus 5 per hook variant. If you want to dig deeper into the economics of running UGC programs at scale, our piece on how AI UGC creators make money in 2026 covers it directly.

Where each tool fits in the funnel

Cold top-of-funnel ads rely heavily on hook variation, which means ADD_CAPTIONS gets the most reps. Consideration-stage ads lean on COMPOSE_OVERLAY for richer B-roll context. Retargeting ads tend to be the simplest - single avatar plus TIMESTAMPED_CAPTIONS - because the viewer already knows the brand and the work is done by the offer.

You can jump directly into the tool at the UGC video generator page.

Frequently asked questions

Can I use a real creator clip as the base instead of an AI avatar? Yes. All five operations work on any MP4. AI avatars are optional; real footage is fully supported.

What aspect ratios are supported? The UGC pipeline is optimized for 9:16 but 1:1 and 16:9 are both supported. Caption sizing scales sensibly across ratios.

Can I preview captions before committing credits? Yes. You can preview the caption placement at generation time and adjust size and position before the final render.

Is there a batch mode for running many variants at once? Yes. You can queue a base video with multiple caption configurations and get all variants back in parallel.

How do I handle white backgrounds instead of black? REMOVE_BLACK_BG targets black specifically. For white or arbitrary backgrounds, run the clip through the image editor's mask tooling first.

Closing takeaway

UGC at scale is a variant production problem, not an individual video problem. Versely's five UGC operations are designed around that reality. Ship the base once with COMPOSE_OVERLAY, track the voiceover with TIMESTAMPED_CAPTIONS, and run hook variants through ADD_CAPTIONS until the data tells you which one wins. The creators who treat these tools as a pipeline - not as one-off effects - are the ones who put twenty winning hooks into rotation every week.