AI Models
AI Image-to-Image Editing: The Complete Workflow for 2026
Image-to-image editing in 2026 — Nano Banana Edit, Flux 2 Edit, inpainting, mask-based edits, style transfer, and the workflows pros use for ecommerce and portrait work.
A product photographer I work with had 280 hero shots to deliver in three weeks. Different SKUs, same brand visual system: same model holding the product, same studio set, same warm-tungsten grade. In 2024 that was a four-day shoot plus two weeks of retouching. In 2026 it is one shoot day, one model release, and an image-to-image pipeline running Nano Banana Edit and Flux 2 Edit through Versely. The product changes, the model stays, the lighting stays, the brand world stays. 280 finals shipped in nine days.
This is what image-to-image editing means in 2026, end to end: what it actually is (versus text-to-image), the model landscape, the prompt and mask techniques that work, and the production workflows pros run for ecommerce, portrait, and poster work.
What image-to-image actually means (versus text-to-image)
Text-to-image generates a new picture from words. Image-to-image takes an existing image as conditioning and modifies it. Both are diffusion. The difference is what enters the denoising process: text-to-image starts from pure noise; image-to-image starts from your image (encoded into latent space) plus optional noise plus a text instruction, and the model denoises toward the modified version.
The practical implications:
- Identity is preserved by default. The starting image is the spatial anchor. The face stays the face, the composition stays the composition, unless your prompt explicitly tells the model to change those.
- Edits are local where you specify them and global where you don't. "Change the jacket to denim" with no mask leaves everything else mostly intact — but mostly is doing work in that sentence, and small unintended drift happens.
- Mask-based edits give surgical control. When you provide a mask (a region marking what to change), the model only modifies inside it. This is inpainting.
- Outpainting extends the canvas. Same mechanism, mask outside the original frame.
- Reference conditioning is a third axis. Some models take a reference image (a style reference, a face reference, a product reference) in addition to the source image and the text prompt.
The 2026 image stack is mostly about combining these three axes — source, reference, and mask — with the right prompt grammar.
The five core image-to-image operations
Every real workflow is one of these or a chain of them.
- Edit by instruction. "Change the background to a snowy mountain peak." The model edits the whole image guided by your instruction.
- Inpainting. Paint a mask over a region. Prompt describes what should be in that region. Model fills only the mask.
- Outpainting. Mask the area outside the existing frame. Prompt describes the extended scene. Model fills the new area, blending seamlessly.
- Style transfer. Source image keeps its content; reference image provides the style. Output reads as the source rendered in the reference's style.
- Reference-conditioned generation. Source image plus a face/product/style reference, with text instructions to change something while preserving the references.
The model that wins each operation is different. Nano Banana Edit is the king of edit-by-instruction. Flux 2 Edit and Flux 2 Edit Max are the kings of inpainting and structural edits. Both are competent at the others.
Nano Banana Edit and Flux 2 Edit, side by side
The two models that do almost all the daily work in 2026 image-to-image. The fuller comparison treatment lives in the Nano Banana 2 capabilities post and the Flux 2 Pro/Max/Klein post; here is the head-to-head focused on editing.
| Capability | Nano Banana Edit | Flux 2 Edit |
|---|---|---|
| Edit by natural-language instruction | Strongest | Strong |
| Identity preservation across edits | Strongest | Strong |
| Inpainting precision | Good | Strongest |
| Structural / geometric edits (recompose, resize subject) | Limited | Strongest |
| Photorealism on complex scenes | Strong | Strongest (Flux 2 Edit Max) |
| Prompt adherence on multi-clause edits | Strong | Strongest |
| Text-in-image edits | Limited | Good (Flux 2 Edit Max strongest) |
| Sequential edit retention | Excellent (5–7 passes clean) | Strong (4–5 passes clean) |
| Speed | Fast | Moderate |
| Cost | Lower | Higher (especially Max) |
| Best-fit operation | Wardrobe swap, mood/lighting shift, expression change | Inpainting, outpainting, structural changes |
The pattern: Nano Banana for the conversational, identity-preserving edits ("same person, now wearing a coat"); Flux 2 Edit when you need to mask a region and surgically replace it.
Reference conditioning: the underused power feature
Most creators use image-to-image as one input + text. The pros use one input + one or two references + text.
A reference image guides the output toward the reference's qualities — identity, style, composition, lighting — without copying it. Three reference types that get used most:
- Face/identity reference. "Edit this scene but the person's face matches this reference." Used for character consistency across an image series.
- Style reference. "Apply the lighting and color palette of this reference to the source." Style transfer without literal style imitation.
- Product reference. "Replace the bottle in the source image with this specific product, matched to its surface finish and label." This is the operation that makes 280 SKUs into 280 finals from one shoot day.
Both Nano Banana Edit and Flux 2 Edit accept references. Flux 2 Edit handles multi-reference combinations slightly better; Nano Banana Edit handles single-reference identity preservation slightly better.
Inpainting and outpainting, in practice
Inpainting is mask + prompt. The mask is a black-and-white image where white is the region to edit and black is the region to preserve. Most modern interfaces let you brush the mask directly in the UI.
What works:
- Precise object removal — mask the object, prompt "fill with the natural background," done.
- Object replacement — mask the object, prompt the new object with composition cues ("a glass of red wine, matching the existing lighting from the right").
- Localized retouching — mask a hand, fix the fingers; mask a logo, replace it; mask a sky, swap it.
- Wardrobe and accessory swaps — mask the shirt, prompt the new one. Use Flux 2 Edit for tighter mask adherence.
Outpainting is the same mechanism with the mask outside the original frame. Use it to:
- Extend a vertical shot to horizontal — mask the side regions, prompt the extended environment.
- Add headroom for typography — mask the top, prompt the sky or background continuation.
- Reframe to a new aspect ratio — most ecommerce work needs both square and 4:5 versions of the same hero. Outpaint instead of reshooting.
A practical tip: feather the mask edge by 4–8 pixels. Hard mask edges produce visible seams; feathered edges blend.
Worked prompt examples
These are exact prompts that produced finals on real client work in the last 60 days.
Example 1 — wardrobe swap with identity preservation (Nano Banana Edit)
Same person, same pose, same camera angle, same studio lighting and warm tungsten color grade. Change the outfit only — replace the gray sweatshirt with a fitted black wool blazer worn over a crisp white t-shirt. Preserve all facial features, hair, and skin texture exactly as in the source. No other changes.
The "no other changes" tail clause matters. Nano Banana respects it.
Example 2 — background swap with environmental coherence (Flux 2 Edit)
Keep the subject and her pose exactly as in the source image, including her hair, expression, and the rim light catching her left side. Replace the background only — change from the white studio backdrop to a softly out-of-focus rainy city street at dusk, neon signage in pink and cyan reflecting in puddles, the existing rim lighting on the subject now reading as motivated by the neon, depth of field around f/2.0.
Mask: anything that is not the subject. The "rim lighting now motivated by neon" clause is what makes the edit feel real instead of comp'd.
Example 3 — inpainting object removal (Flux 2 Edit)
Source: a portrait with a coffee cup and a phone on the table.
Mask: coffee cup and phone region. Prompt: empty wood table surface, matching the existing oak grain, lighting, and shadow from the rim light continuing naturally across the now-empty area. Photorealistic, seamless.
Example 4 — product replacement at scale (Flux 2 Edit with product reference)
Source: model holding bottle A. Reference: bottle B (different SKU). Mask: the bottle in the source.
Replace the bottle in the masked region with the product shown in the reference image. Match the reference's exact label artwork, bottle shape, glass color, and surface finish. Preserve the source image's lighting on the bottle (key light from upper-left, soft fill from below) and the model's hand position around the bottle. Photorealistic commercial product photography.
This is the prompt that ran 280 times with the only variable being the bottle reference. Output quality was usable on first pass for ~85% of SKUs and required one regenerate on the rest.
Example 5 — style transfer with subject preservation (Nano Banana Edit + style reference)
Source: a neutral portrait. Reference: a 1970s film-noir frame with strong key/fill ratio.
Apply the lighting style and color palette of the reference image to the source. Convert the source's even studio lighting to a high-contrast key-fill ratio with a single warm key light from camera-left and deep falloff into shadow on the right side of the face. Preserve the subject's identity, pose, and framing exactly. Slight grain, warm shadows, cooler highlights.
Production workflows
Three workflow patterns that dominate creator and brand work in 2026.
Workflow 1 — ecommerce variant production
The product photography workflow that compresses a four-day reshoot into a one-day shoot plus an editing pass.
- Shoot day. One model, one set, one base lighting. Capture the hero pose holding a placeholder product or no product.
- Generate or shoot product references. Each SKU needs a clean reference image (white-background product shot, photographed or generated in text-to-image).
- Mask-and-replace pipeline. For each variant, mask the product region in the hero, supply the product reference, run Flux 2 Edit with the standardized prompt.
- Background variants. For each color or seasonal variant, change the background via Nano Banana Edit instruction prompts ("change background to autumn forest with warm orange leaves, preserve subject and lighting on subject").
- Aspect ratio variants. Outpaint each final to square, 4:5, and 9:16 for marketplace and social.
- QC pass. Reject anything with a hand glitch, a label distortion, or a lighting mismatch. Regenerate that SKU.
Per-SKU time on a clean pipeline: ~3 minutes. 280 SKUs in ~14 hours of routed processing.
Workflow 2 — portrait retouching at production speed
Editorial and headshot retouching that used to be Photoshop hours becomes a chained edit pass.
- Base retouch instruction. "Subtle skin retouch, preserve all skin texture and pores, remove only obvious blemishes, even out skin tone slightly without smoothing." Run on Nano Banana Edit.
- Catchlight and eye sharpening. Mask the eyes, prompt "subtle catchlight enhancement and slight micro-contrast, preserve iris pattern and color." Run Flux 2 Edit.
- Hair fly-away cleanup. Mask edges, prompt "remove stray hair fly-aways while preserving overall hair texture and shape." Flux 2 Edit.
- Background tidying. Mask background, prompt cleaner version. Flux 2 Edit.
- Color grade. Final pass via Nano Banana Edit instruction: "apply a clean editorial color grade — slightly cool shadows, warm midtones, neutral highlights, low contrast."
Per-portrait time: ~5 minutes versus ~25 minutes in Photoshop.
Workflow 3 — poster and key-art design
The branded-poster workflow that combines generation and editing.
- Generate base hero image in Flux 2 Pro or Nano Banana 2 (cold generation). The strongest first frame possible.
- Outpaint to the poster aspect ratio (typically 27x40 inches at print or 1080x1620 social).
- Composite secondary elements via inpainting — title-card area, product overlay, supporting characters added via mask + reference conditioning.
- Typography pass — for actual title text, switch to Flux 2 Edit Max which handles text-in-image best (or finalize text in a vector tool).
- Final polish via Nano Banana Edit color grade and tone adjustments.
For deeper image-prompting craft, the AI prompt engineering for image generation post covers the foundational structure that all of these workflows are built on.
Common failure modes and fixes
- Identity drift after multiple edits. Nano Banana holds 5–7 passes, Flux 2 Edit holds 4–5. After that, regenerate from the original source rather than chaining further.
- Mask edge seams. Feather the mask 4–8 pixels. Re-run with a slightly larger mask if the seam shows on the second pass.
- Lighting mismatch on inpainted region. Add explicit lighting cues in the prompt ("matching the existing key light from upper-left at 45 degrees").
- Hand and finger artifacts. Always a problem. Mask just the hand region and re-prompt with "anatomically correct hand, fingers visible and well-formed, matching the wrist position from the source." Sometimes takes 3–4 tries.
- Text in image looks wrong. Switch to Flux 2 Edit Max or finalize text outside the model.
- Color shift on chained edits. Pin a reference color/grade across the chain — restate the grade in every edit prompt.
- Subject identity changes on style transfer. Reduce style strength (most platforms expose a slider, default 1.0; lower to 0.6–0.8 to preserve more of the source).
How Versely fits
Versely routes Nano Banana Edit, Nano Banana 2, Flux 2 Pro, Flux 2 Max, Flux 2 Klein, and Flux 2 Edit from one prompt interface, picking the right model for the operation type. For an instruction edit, it routes Nano Banana Edit; for an inpainting job with a mask, it routes Flux 2 Edit; for cold generation of a hero, Flux 2 Pro or Max.
Most creator workflows are entered through text-to-image for the cold generation step, then continue inside the same interface for the edit chain. The router learns over time which model wins for your specific prompt patterns.
For the broader image stack and how Versely routes it, the Versely AI models guide is the reference. For deeper-cut comparisons against Midjourney and Ideogram, see the showdown post.
When images move into video — animating a hero shot, looping a product reveal — Versely's AI video generator and AI slideshow maker take the edited image and route it through Sora 2 Pro, VEO 3.1, Kling 3, or Seedance 2.0 image-to-video as fits the brief.
FAQ
What is the difference between image-to-image and text-to-image?
Text-to-image generates a new image from a prompt only. Image-to-image starts from an existing image and modifies it according to a prompt (and optionally a mask and/or reference image). Image-to-image preserves identity and composition by default; text-to-image generates fresh.
Is Nano Banana Edit better than Flux 2 Edit?
For natural-language instruction edits with identity preservation — yes. For mask-based inpainting and structural edits — Flux 2 Edit wins. Most pros use both.
How many times can I edit an image before quality degrades?
Nano Banana Edit holds 5–7 sequential edits cleanly. Flux 2 Edit holds 4–5. Past those counts, drift and softness accumulate. Restart from source for further work.
Can image-to-image preserve a specific person's face?
Yes — both Nano Banana Edit and Flux 2 Edit preserve identity well across instruction edits. For tighter character consistency across many images, supply a face reference image alongside the source.
Does inpainting work for removing watermarks or copyrighted elements?
Technically yes, mechanically. Legally and ethically, you must own or have rights to the image. Removing other people's watermarks is not a use case anyone should be running.
Can Versely handle batch image-to-image edits?
Yes — both via the API and via the text-to-image interface in batch mode. Most production pipelines (ecommerce variant generation, portrait retouching at scale) run batch.
Bottom line
Image-to-image is what makes generative image models a production tool rather than a curiosity. The cold generation step is now the smaller part of the workflow; the edit chain is where deliverables are made. Learn the five operations, master the Nano Banana Edit / Flux 2 Edit pair, layer references and masks where they pay off, and you can ship the volume real briefs demand without the time real reshoots take.