Skip to content
WaifuStack
Go back

Dynamic Character Visuals: How One Character Can Look Like Two Different People

One of our characters in Suzune has a secret: she’s beautiful, but she hides it.

Mao is designed as the “plain office worker” archetype — no makeup, thick glasses, hair with no styling, clothes buttoned up to the collar. But when she decides to dress up? Different person. Crimson lipstick, smokey eyeshadow, hair styled, glasses off. Same woman, completely different energy.

The technical challenge: how do you make an AI image generation system produce two visually distinct versions of the same character, automatically, based on the story context?

Here’s how we solved it.

Table of contents

Open Table of contents

The Problem: One Face, Two Modes

Most AI character image systems use one of two approaches:

  1. LoRA models — A fine-tuned model that always produces the same character look
  2. Appearance prompts — Text descriptions that guide generation (inconsistent across images)

Neither handles the “transformation” use case well:

What we needed: a system that swaps the character’s visual foundation based on what’s happening in the story.


The Solution: img2img Base Image Variants

Instead of LoRA, Mao uses img2img generation — the model takes a base image as input and transforms it according to the prompt while preserving the core facial structure.

The key insight: use different base images for different character states.

characters/mao/
├── base_image.png            ← Default: glasses, no makeup, plain
├── base_image_serious.png    ← Activated: no glasses, makeup-ready face
└── character.yaml

Both images are the same person with the same facial structure. But the base composition is different:

Default (base_image.png)Serious (base_image_serious.png)
GlassesOnOff
ExpressionNeutral, slightly stiffConfident, composed
HairUnstyled, parted in middleSlightly refined
MakeupNoneMinimal (ready for prompt to add more)

The img2img pipeline then applies the scene prompt on top of whichever base is active. The result: consistent facial identity with dramatically different vibes.


Automatic Switching Logic

The base image swap is triggered by outfit keywords. When the character’s current outfit includes makeup items, the system automatically switches to the serious base image.

Here’s the core logic:

# Detect makeup in current outfit → switch base image
makeup_keywords = (
    "lipstick", "eyeshadow", "mascara",
    "eyeliner", "makeup", "cosmetics"
)

if any(kw in clothing.lower() for kw in makeup_keywords):
    serious_path = base_image.parent / "base_image_serious.png"
    if serious_path.exists():
        base_image = serious_path  # swap!

That’s it. The detection is deliberately simple — keyword matching on the outfit string. No ML, no complex logic. Just: does the outfit mention makeup? → use the makeup-ready face.

Why Keywords Instead of Something Smarter?

Because outfit descriptions are generated by our own system (the wardrobe engine), so we control the vocabulary. We don’t need fuzzy matching when we write the prompts ourselves.

The wardrobe entry for Mao’s “queen mode” makeup looks like this:

{
  "name": "Seductive Queen Makeup (Mao exclusive)",
  "prompt": "seductive queen makeup, no glasses, dark crimson lipstick,
             heavy mascara, smokey eyeshadow, sharp eyeliner,
             flawless porcelain skin",
  "tags": ["makeup", "seductive", "queen", "mao"],
  "exclusive": ["mao"]
}

When this wardrobe item is active, the clothing string contains “lipstick” and “eyeshadow” → the keyword check fires → base image swaps.


The Visual Pipeline

Here’s the full flow when Mao generates a selfie:

1. LLM decides to send a selfie (tool call)

2. Load daily outfit from daily_outfit.json

3. Check outfit for makeup keywords

    ┌────┴────┐
    │ No      │ Yes
    ▼         ▼
base_image  base_image_serious
  .png        .png
    │         │
    └────┬────┘

4. Encode base image as input to SDXL img2img

5. Apply scene prompt + outfit prompt

6. RunPod generates image → send to chat

The beauty of this approach: the switching is invisible to the LLM. The character doesn’t need to know which base image is being used. It just describes the scene, and the image system handles the visual consistency.


Designing the Base Images

The hardest part isn’t the code — it’s creating base images that work well with img2img.

Rules for Good Base Images

1. Same face, different energy

Both base images must be unmistakably the same person. Use the same:

Change only:

2. Neutral enough for img2img to work with

Base images shouldn’t be too detailed or specific. The img2img pipeline needs room to apply the scene prompt. If the base image is already wearing a red dress, it’s harder for the model to generate her in a white one.

Keep base images in simple clothing or a neutral composition.

3. High denoising strength for outfit changes, low for facial consistency

This is the balancing act of img2img:

Denoising StrengthEffect
0.3–0.4Face stays very consistent, but outfits barely change
0.5–0.6Good balance — face recognizable, outfits change well
0.7–0.8Outfits change dramatically, but face may drift

We typically use 0.5–0.6 for general scenes and 0.4–0.5 for close-up portraits where facial consistency matters most.


Why This Matters for Character Design

The base image variant system isn’t just a technical feature — it’s a character design tool.

Mao’s “transformation” is part of her character arc. She’s designed as someone who doesn’t care about appearances, but when the moment calls for it — a date, a confrontation, or a moment of confidence — she transforms.

This mirrors a popular character archetype in anime and manga: the “hidden beauty” (隠れ美人). The gap between her daily appearance and her full potential is part of what makes her compelling.

Without dynamic visuals, this character concept falls flat. You can describe the transformation in text, but showing it in generated images makes it visceral.

The Gap Effect

In character design, “gap moe” (ギャップ萌え) — the appeal of contrast — is one of the most powerful tools:

The base image switching system lets us express gap moe visually, not just textually. And users absolutely notice.


Extending the Pattern

While Mao uses “plain → glamorous,” the same pattern works for many character transformations:

Character ConceptDefault BaseVariant BaseTrigger
Hidden beauty (Mao)Glasses, no makeupNo glasses, confidentMakeup keywords
Warrior/fighterCasual clothesBattle-ready, intenseWeapon/armor keywords
Shy characterAverted gaze, closed postureEye contact, open postureHigh affection score
Idol/performerOffstage casualStage costume, spotlightsPerformance scene

The trigger doesn’t have to be outfit-based either. You could switch base images based on:

We haven’t implemented all of these yet, but the architecture supports them with minimal changes.


Implementation Checklist

If you want to add this to your own bot:

  1. Create 2+ base images of your character with consistent facial features but different compositions
  2. Name them consistentlybase_image.png (default), base_image_[variant].png
  3. Add keyword detection in your image generation pipeline — check the outfit/scene string for trigger words
  4. Swap the base image path before passing it to your img2img model
  5. Test denoising strength — find the sweet spot between outfit flexibility and facial consistency

The code change is genuinely small (< 10 lines). The character design work is where the real effort goes.


Tools We Use

If you’re not ready to build your own image pipeline, platforms like Candy AI and DreamGF offer built-in character customization with appearance variants — not as flexible as a custom system, but a good starting point.


This article is part of WaifuStack’s series on building AI roleplay bots. See also: Prompt Engineering for Immersive Roleplay and From Idea to Production.

Working on something similar? Share your approach on X.


Share this post on:

Previous Post
Prompt Engineering for Immersive AI Roleplay: Lessons from Building Suzune
Next Post
DeepSeek vs Claude vs Gemini for Roleplay: Real-World Benchmarks from Production