This is the reference article. Every other article on WaifuStack covers a specific subsystem — this one shows how they all fit together.
If you’ve read our other articles, this is the map. If this is your first article, start here and follow the links to deep-dives.
Table of contents
Open Table of contents
The System at a Glance
┌──────────────────────────────────────────────────────────┐
│ SUZUNE ARCHITECTURE │
├──────────────────────────────────────────────────────────┤
│ │
│ Telegram ──→ Message Handler │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ System Prompt Assembly │ │
│ │ ├── Character Persona (YAML) │ │
│ │ ├── Speech Rules (Markdown) │ │
│ │ ├── Lorebook (keyword scan) │ │
│ │ ├── Evaluation Scores │ │
│ │ ├── Milestone Injections │ │
│ │ ├── Chat Summary │ │
│ │ ├── Character Diary (Memo) │ │
│ │ ├── Daily Outfit │ │
│ │ ├── Current DateTime │ │
│ │ └── Tone Reminder │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ LLM Router │ │
│ │ ├── NSFW Detection │ │
│ │ ├── Pre-emptive Routing │ │
│ │ ├── DeepSeek V3.2 (primary) │ │
│ │ ├── Claude Haiku (fallback) │ │
│ │ └── Censorship Detection │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Post-Processing │ │
│ │ ├── DS3.2 Artifact Cleanup │ │
│ │ ├── Quality Rewrite (Claude) │ │
│ │ ├── Anti-Repetition │ │
│ │ └── Tone Confusion Check │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ Tool Calls Text Response │
│ ├── Selfie ├── Emotion Detection │
│ ├── Outfit ├── Message Splitting │
│ ├── Memory └── Telegram Delivery │
│ └── Eval Update │
│ │
│ Persistence: SQLite (WAL) ──→ messages, metrics, state │
│ │
└──────────────────────────────────────────────────────────┘
Let’s walk through each layer.
Layer 1: Message Ingestion
Stack: Python 3.11 + aiogram v3 (async Telegram framework)
When a user sends a message:
- Receive via Telegram webhook or long-polling
- Timestamp the message in JST:
[04/02 14:30] - Extract metadata: inner voice
(心の声: ...), command flags (/nsfw,/sfw) - Save to SQLite database
- Start typing indicator (refreshed every 4 seconds)
Every message is timestamped before storage. This gives the LLM time awareness — it knows when the last message was sent and how much time has passed.
Deep dive: Building a Telegram Roleplay Bot from Scratch
Layer 2: System Prompt Assembly
This is the brain of the system. The system prompt is rebuilt from scratch on every message from 20+ dynamic data sources.
Data Sources (in injection order)
| # | Source | Token Budget | Changes Per |
|---|---|---|---|
| 1 | Character persona (persona.md) | ~800 | Rarely |
| 2 | Behavior rules (rules.md) | ~400 | Rarely |
| 3 | Example dialogue | ~300 | Never |
| 4 | Capability instructions | ~200–500 | Per capability |
| 5 | Lorebook entries (keyword-triggered) | ~300–800 | Per message |
| 6 | Relationship stage guidance | ~200 | Per score change |
| 7 | Character diary (memo.md, last 50 lines) | ~400 | Per session |
| 8 | Shared timeline | ~200 | Per event |
| 9 | Chat summary (compressed history) | ~500 | Periodically |
| 10 | Evaluation scores + impression | ~200 | Per update |
| 11 | Image generation policy | ~50 | Per affinity change |
| 12 | Story direction hints | ~100 | Per GM update |
| 13 | GM directives | ~100 | Per operator command |
| 14 | Milestone injections | ~100 | One-time per milestone |
| 15 | Daily outfit | ~50 | Per outfit change |
| 16 | Relationship reminder (recency fix) | ~100 | Every message |
| 17 | Current datetime | ~20 | Every message |
| 18 | Promise/plan reminder | ~50 | Every message |
| 19 | Story initiative rule | ~50 | Every message |
| 20 | Tone reminder (recency fix) | ~200 | Every message |
Total system prompt: ~3,000–5,000 tokens depending on active lorebook entries and capabilities.
The tone rules and relationship reminder are injected twice — once in the persona section and once at the very end. This fights LLM recency bias in long conversations.
Deep dives:
- Prompt Engineering for Immersive Roleplay
- How to Design AI Personalities with YAML
- Building an Affection System
Layer 3: LLM Router
The router decides which model handles the request and manages fallback chains.
Routing Logic
1. Check LLM profile: runtime override → character YAML → default
2. If primary is Claude AND context is NSFW → pre-emptive route to DeepSeek
3. Call primary model
4. If rate limited → fallback
5. If refusal detected → fallback
6. If empty response → retry without tools, then fallback
7. If silent sanitization → fallback
Model Assignments
| Model | Role | Cost/1M tokens |
|---|---|---|
| DeepSeek V3.2 | Primary chat | $0.25 / $0.40 |
| Claude Haiku 4.5 | Quality rewrite + SFW fallback | $0.80 / $4.00 |
| Gemini 2.5 Flash | NPC direction | $0.30 / $2.50 |
| Gemini 2.0 Flash | NPC rewrite | $0.10 / $0.40 |
| GLM-5 | Scene descriptions | $0.80 / $2.56 |
Deep dives:
Layer 4: Tool Execution
The LLM can call tools during response generation. Tool calls are executed in a loop (max 5 iterations) until no more tools are requested.
| Tool | What It Does |
|---|---|
generate_image | Creates a character selfie/scene image via RunPod |
change_outfit | Selects from wardrobe, updates daily outfit |
save_memory | Writes to character diary (memo.md) |
update_evaluation | Updates relationship scores (trust, affection, etc.) |
Image Generation Pipeline
Tool call → Load daily outfit → Check makeup keywords
│ │
│ ┌─────────┴─────────┐
│ │ No makeup │ Makeup detected
│ ▼ ▼
│ base_image.png base_image_serious.png
│ │ │
└────────────────────┴────────────────────┘
│
Build SD prompt
│
Send to RunPod (SDXL img2img)
│
Return image to Telegram
Deep dive: Dynamic Character Visuals
Layer 5: Post-Processing
After the LLM generates a response, several cleanup and quality steps run:
DS3.2 Artifact Cleanup
- Fix tokenization glitches (
社long→社長) - Truncate repetition loops (same phrase 3+ times)
- Extract plain-text tool calls (DS3.2 sometimes outputs them as text)
Quality Rewrite Pipeline
DS3.2 draft → Claude Haiku rewrite → censorship check
└── If censored: discard rewrite, use original
└── If circuit breaker active: skip rewrite entirely
Anti-Repetition
- Track opening patterns of last 3 responses
- If same pattern repeats: inject variation warning
Tone Confusion Detection
- Check if character is using another character’s speech patterns
- Especially important in multi-character systems
Layer 6: Context Management
Tiered Memory Architecture
┌─────────────────────────────────────────┐
│ Tier 1: Raw Messages (last 15) │ ~2,000 tokens
│ Full message text, verbatim │ Updates: every message
├─────────────────────────────────────────┤
│ Tier 2: Chat Summary │ ~500 tokens
│ Compressed narrative of older messages │ Updates: periodically
├─────────────────────────────────────────┤
│ Tier 3: Character Diary (memo.md) │ ~400 tokens (last 50 lines)
│ Character's own notes and memories │ Updates: per save_memory call
├─────────────────────────────────────────┤
│ Tier 4: Evaluation Scores │ ~200 tokens
│ Relationship state + impressions │ Updates: per update_evaluation
└─────────────────────────────────────────┘
Compression Pipeline
- Old messages → summarized into flowing narrative (chat_summary.md)
- Chat summary grows → compressed further
- Character diary grows → compressed, deduplicated
Result: The character can reference events from days ago without consuming the entire context window.
Layer 7: Persistence
Database: SQLite with WAL (Write-Ahead Logging) mode
| Table | Contents |
|---|---|
messages | All conversation history (id, character, role, content, timestamp) |
api_calls | LLM call metrics (model, tokens, cost, latency, fallback status) |
image_calls | Image generation metrics (backend, cost) |
Why SQLite?
- Zero configuration
- WAL mode handles concurrent reads during writes
- Single file — trivial to backup
- More than fast enough for a single-server bot
Branch System
The entire conversation state can be snapshotted and restored:
/branch save "before_confession"
→ [conversation continues]
/branch restore "before_confession"
→ [back to the saved state]
This enables story branching — users can explore different dialogue paths.
Layer 8: Subsystems
Lorebook Engine
- JSON files with keyword-triggered entries
- Features: sticky (persist N turns), cooldown, delay, probability, selective matching
- Affinity-gated entries (NSFW world info blocked at low trust)
Wardrobe System
- Outfit database with categories (tops, bottoms, accessories, makeup)
- Per-character exclusive items
- Makeup detection triggers base image switching
Emotion Detection
- Regex-based emotional state extraction from response text
- Mapped to expression sprites (happy, sad, angry, surprised, etc.)
- Sprites sent before text response
NPC System
- Gemini 2.5 Flash generates NPC concepts
- DeepSeek writes NPC dialogue
- Gemini 2.0 Flash rewrites for polish
GM Console
- 45+ development tools for maintenance, analysis, character creation
- Safe delegation to Claude via file-level SFW/NSFW separation
- CLI interface with interactive and single-shot modes
Deep dive: Using Claude for NSFW Bot Development
Cost Structure
| Component | Monthly Cost | % |
|---|---|---|
| DeepSeek V3.2 (primary chat) | $15–25 | 45% |
| Claude Haiku (quality rewrites) | $5–10 | 18% |
| Gemini + GLM-5 (NPCs, scenes) | $3–5 | 8% |
| RunPod (image generation) | $5–10 | 18% |
| VPS hosting | $5 | 10% |
| Total | $33–55 |
Deep dive: Running an AI Bot on $50/month
Key Design Principles
1. Dynamic Over Static
The system prompt is never the same twice. Every data source contributes to a context that reflects the current state of the world, the relationship, and the character’s internal life.
2. Graceful Degradation
Every model call has a fallback. Every fallback has a fallback. The user never sees an error — they get a slightly different quality response.
3. Right Model for the Right Job
Don’t fight model limitations. Use uncensored models for uncensored content, quality models for quality, cheap models for bulk work.
4. Characters Are Data, Not Code
Everything about a character — personality, rules, NSFW behavior, outfits, memories — lives in files, not code. Adding a new character is a YAML edit, not a code change.
5. The Relationship Is the Product
The affection system, milestone unlocks, and behavior gating create an experience that no flat chatbot can match. Users come back because the character remembers and evolves.
Build Your Own or Try Existing Platforms
This architecture took months to build. If you want to start simpler:
- Candy AI — Closest to a “just works” all-in-one experience
- JanitorAI — Bring your own API key for model flexibility
- FantasyGF — Best for AI girlfriend with photo generation
If you want to build your own, start with our Telegram bot tutorial and layer in features from there.
This is the architecture reference for Suzune. Each subsystem has its own deep-dive article — follow the links above to explore. Follow @waifustack for new articles every week.