How-to
Versely UGC Video Generator: A Complete Walkthrough of the 5 Tools That Ship Ads Faster
Walk through all five Versely UGC tools - overlays, captions, background removal, compose overlay, and timestamped captions - with pricing and a real pipeline.
UGC ads live or die on two things: how fast you can produce variants, and how close they look to organic. Versely's UGC Video Generator attacks both. It is not a single monolithic "make me an ad" button. It is five focused, composable operations that each solve one real production problem, and together replace what used to be a CapCut project, a Premiere session, and a round of manual subtitling.
This walkthrough covers all five UGC ops in detail, the exact credit costs, the caption and overlay configuration options, and a real end-to-end creator workflow that ships an ad from script to MP4 in one pass.
The five UGC operations
Each operation is a standalone tool with its own credit cost. You pick the one that matches the transformation you need.
| Operation | Credits | What it does |
|---|---|---|
| VIDEO_OVERLAY | 10 | Places a smaller video on top of a base video with position control |
| ADD_CAPTIONS | 5 | Burns a single static caption block into the video |
| REMOVE_BLACK_BG | 10 | Strips a black background from a foreground clip for compositing |
| COMPOSE_OVERLAY | 15 | Stitches a sequence and applies an overlay in one pass |
| TIMESTAMPED_CAPTIONS | 8 | Generates captions aligned to the audio timeline |
If you are familiar with the broader UGC playbook, pair this guide with our complete guide to AI UGC ads for ecommerce.
Caption sizing and positioning
Captions are the single most impactful ad-level lever. Versely gives you three sizes and three positions to work with.
The three caption sizes are:
- Small at 24 px - for longer lines, dense information, or square formats
- Medium at 28 px - the default for most 9:16 short-form content
- Large at 32 px - for hook lines, single-word beats, high-contrast thumbnails
The three caption positions are top, middle, and bottom. Top tends to perform best when the creator's face is in the lower third of the frame. Middle works for products where you want eyes drawn to the on-screen text first. Bottom is the platform-safe default.
Pick size and position per variant rather than per ad. A/B testing caption position alone can swing CTR by meaningful percentages on direct response creative.
Overlay positioning
Overlays are smaller videos composited on top of the base clip. You can place them in five positions:
- top-left
- top-right
- bottom-left
- bottom-right
- center
The center position is for reveal moments - a product shot dropped into the center of a talking head as the voiceover lands on the product name. The corner positions are for persistent context, like a gameplay clip riding along in the bottom-right while the creator talks.
VIDEO_OVERLAY in detail
Cost: 10 credits.
VIDEO_OVERLAY takes a base video and an overlay video, plus a position and a size, and returns a composited MP4. It is the right tool when your base and overlay are already edited individually. If the base still needs to be built by stitching multiple clips, use COMPOSE_OVERLAY instead and save a step.
Use cases: talking head plus gameplay, product demo plus reaction webcam, AI avatar plus B-roll corner.
ADD_CAPTIONS in detail
Cost: 5 credits.
ADD_CAPTIONS burns a single static caption block into the video. It is perfect for a fixed hook line that stays on screen for the whole clip, or for a branded tagline at the bottom of every variant. Pick your size, position, and text, and the tool returns the final video with the caption rendered directly into the frames.
Static captions are the cheapest tool in the suite because the rendering is trivial. Use it liberally for hook variants.
REMOVE_BLACK_BG in detail
Cost: 10 credits.
REMOVE_BLACK_BG strips a black background from a foreground clip and returns a version with transparency where the black used to be. This is the tool you use when an AI avatar or a stock animation ships with a solid black background and you need to composite it onto another video.
Chain it with VIDEO_OVERLAY and you have a full compositing pipeline: remove the black background, then overlay the cleaned foreground on top of your base.
COMPOSE_OVERLAY in detail
Cost: 15 credits.
COMPOSE_OVERLAY is the power move. It stitches a sequence of base clips and applies the overlay in a single pass, instead of forcing you to stitch first with a separate tool and overlay second. It is the right choice for multi-scene ads where the overlay runs continuously across scene cuts, because it avoids re-encoding the base twice.
The credit cost is 15 - slightly more than doing it in two steps - but the quality win is meaningful because you avoid a generation of compression loss from the intermediate stitch.
TIMESTAMPED_CAPTIONS in detail
Cost: 8 credits.
Static captions are fine when the text is a fixed hook. When the text has to track the voiceover, you want TIMESTAMPED_CAPTIONS. This tool generates per-word or per-phrase captions aligned to the audio and burns them into the video. This is the standard for voiced UGC where every line needs on-screen reinforcement.
Most short-form ads in 2026 run timestamped captions by default. If your voiceover is dense, use medium size at top position so the text does not collide with the creator's lip movement.
A real creator workflow, end to end
Here is a full workflow that produces a voiced UGC ad from a blank page to a publish-ready file.
Step 1. Script the hook and body
Write a hook line in under nine words and a body script in under forty. Keep it punchy; UGC does not reward subtlety.
Step 2. Generate an AI avatar
Use a Kling Avatar V2 or HeyGen Avatar V4 generation to produce the talking-head base clip from your script. This is your primary footage.
Step 3. Stitch B-roll or gameplay
Gather the background clip that will ride alongside the avatar. For gameplay, a corner overlay works. For product B-roll, a center reveal hits harder.
Step 4. Compose the overlay in one pass
Use COMPOSE_OVERLAY to stitch the avatar segments and drop the B-roll overlay in at the right position. Cost: 15 credits. Output: a single composited MP4.
Step 5. Burn timestamped captions
Run TIMESTAMPED_CAPTIONS with medium size at top position. Cost: 8 credits. Your captions now track the voiceover beat for beat.
Step 6. Ship variants
Run the same base through different caption positions and sizes. Each static variant via ADD_CAPTIONS costs 5 credits. In under an hour you have ten hook variants ready for paid testing.
Total credits for a full production pass: 28 for the core ad, plus 5 per hook variant. If you want to dig deeper into the economics of running UGC programs at scale, our piece on how AI UGC creators make money in 2026 covers it directly.
Where each tool fits in the funnel
Cold top-of-funnel ads rely heavily on hook variation, which means ADD_CAPTIONS gets the most reps. Consideration-stage ads lean on COMPOSE_OVERLAY for richer B-roll context. Retargeting ads tend to be the simplest - single avatar plus TIMESTAMPED_CAPTIONS - because the viewer already knows the brand and the work is done by the offer.
You can jump directly into the tool at the UGC video generator page.
Frequently asked questions
Can I use a real creator clip as the base instead of an AI avatar? Yes. All five operations work on any MP4. AI avatars are optional; real footage is fully supported.
What aspect ratios are supported? The UGC pipeline is optimized for 9:16 but 1:1 and 16:9 are both supported. Caption sizing scales sensibly across ratios.
Can I preview captions before committing credits? Yes. You can preview the caption placement at generation time and adjust size and position before the final render.
Is there a batch mode for running many variants at once? Yes. You can queue a base video with multiple caption configurations and get all variants back in parallel.
How do I handle white backgrounds instead of black?
REMOVE_BLACK_BG targets black specifically. For white or arbitrary backgrounds, run the clip through the image editor's mask tooling first.
Closing takeaway
UGC at scale is a variant production problem, not an individual video problem. Versely's five UGC operations are designed around that reality. Ship the base once with COMPOSE_OVERLAY, track the voiceover with TIMESTAMPED_CAPTIONS, and run hook variants through ADD_CAPTIONS until the data tells you which one wins. The creators who treat these tools as a pipeline - not as one-off effects - are the ones who put twenty winning hooks into rotation every week.