Architecture of a Production NSFW RP Bot

This is the reference article. Every other article on WaifuStack covers a specific subsystem — this one shows how they all fit together.

If you’ve read our other articles, this is the map. If this is your first article, start here and follow the links to deep-dives.

Open Table of contents

The System at a Glance
Layer 1: Message Ingestion
Layer 2: System Prompt Assembly
- Data Sources (in injection order)
Layer 3: LLM Router
- Routing Logic
- Model Assignments
Layer 4: Tool Execution
- Image Generation Pipeline
Layer 5: Post-Processing
Layer 6: Context Management
- Tiered Memory Architecture
- Compression Pipeline
Layer 7: Persistence
- Why SQLite?
- Branch System
Layer 8: Subsystems
Cost Structure
Key Design Principles
Build Your Own or Try Existing Platforms

The System at a Glance

┌──────────────────────────────────────────────────────────┐
│                    SUZUNE ARCHITECTURE                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Telegram ──→ Message Handler                            │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   System Prompt Assembly        │                     │
│  │   ├── Character Persona (YAML)  │                     │
│  │   ├── Speech Rules (Markdown)   │                     │
│  │   ├── Lorebook (keyword scan)   │                     │
│  │   ├── Evaluation Scores         │                     │
│  │   ├── Milestone Injections      │                     │
│  │   ├── Chat Summary              │                     │
│  │   ├── Character Diary (Memo)    │                     │
│  │   ├── Daily Outfit              │                     │
│  │   ├── Current DateTime          │                     │
│  │   └── Tone Reminder             │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   LLM Router                   │                     │
│  │   ├── NSFW Detection            │                     │
│  │   ├── Pre-emptive Routing       │                     │
│  │   ├── DeepSeek V3.2 (primary)   │                     │
│  │   ├── Claude Haiku (fallback)   │                     │
│  │   └── Censorship Detection      │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   Post-Processing               │                     │
│  │   ├── DS3.2 Artifact Cleanup    │                     │
│  │   ├── Quality Rewrite (Claude)  │                     │
│  │   ├── Anti-Repetition           │                     │
│  │   └── Tone Confusion Check      │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│          ┌────────┴────────┐                             │
│          ▼                 ▼                             │
│     Tool Calls        Text Response                     │
│     ├── Selfie         ├── Emotion Detection            │
│     ├── Outfit         ├── Message Splitting            │
│     ├── Memory         └── Telegram Delivery            │
│     └── Eval Update                                     │
│                                                          │
│  Persistence: SQLite (WAL) ──→ messages, metrics, state  │
│                                                          │
└──────────────────────────────────────────────────────────┘

Let’s walk through each layer.

Layer 1: Message Ingestion

Stack: Python 3.11 + aiogram v3 (async Telegram framework)

When a user sends a message:

Receive via Telegram webhook or long-polling
Timestamp the message in JST: [04/02 14:30]
Extract metadata: inner voice (心の声: ...), command flags (/nsfw, /sfw)
Save to SQLite database
Start typing indicator (refreshed every 4 seconds)

Every message is timestamped before storage. This gives the LLM time awareness — it knows when the last message was sent and how much time has passed.

Deep dive: Building a Telegram Roleplay Bot from Scratch

Layer 2: System Prompt Assembly

This is the brain of the system. The system prompt is rebuilt from scratch on every message from 20+ dynamic data sources.

Data Sources (in injection order)

#	Source	Token Budget	Changes Per
1	Character persona (persona.md)	~800	Rarely
2	Behavior rules (rules.md)	~400	Rarely
3	Example dialogue	~300	Never
4	Capability instructions	~200–500	Per capability
5	Lorebook entries (keyword-triggered)	~300–800	Per message
6	Relationship stage guidance	~200	Per score change
7	Character diary (memo.md, last 50 lines)	~400	Per session
8	Shared timeline	~200	Per event
9	Chat summary (compressed history)	~500	Periodically
10	Evaluation scores + impression	~200	Per update
11	Image generation policy	~50	Per affinity change
12	Story direction hints	~100	Per GM update
13	GM directives	~100	Per operator command
14	Milestone injections	~100	One-time per milestone
15	Daily outfit	~50	Per outfit change
16	Relationship reminder (recency fix)	~100	Every message
17	Current datetime	~20	Every message
18	Promise/plan reminder	~50	Every message
19	Story initiative rule	~50	Every message
20	Tone reminder (recency fix)	~200	Every message

Total system prompt: ~3,000–5,000 tokens depending on active lorebook entries and capabilities.

The tone rules and relationship reminder are injected twice — once in the persona section and once at the very end. This fights LLM recency bias in long conversations.

Deep dives:

Layer 3: LLM Router

The router decides which model handles the request and manages fallback chains.

Routing Logic

1. Check LLM profile: runtime override → character YAML → default
2. If primary is Claude AND context is NSFW → pre-emptive route to DeepSeek
3. Call primary model
4. If rate limited → fallback
5. If refusal detected → fallback
6. If empty response → retry without tools, then fallback
7. If silent sanitization → fallback

Model Assignments

Model	Role	Cost/1M tokens
DeepSeek V3.2	Primary chat	$0.25 / $0.40
Claude Haiku 4.5	Quality rewrite + SFW fallback	$0.80 / $4.00
Gemini 2.5 Flash	NPC direction	$0.30 / $2.50
Gemini 2.0 Flash	NPC rewrite	$0.10 / $0.40
GLM-5	Scene descriptions	$0.80 / $2.56

Deep dives:

Layer 4: Tool Execution

The LLM can call tools during response generation. Tool calls are executed in a loop (max 5 iterations) until no more tools are requested.

Tool	What It Does
`generate_image`	Creates a character selfie/scene image via RunPod
`change_outfit`	Selects from wardrobe, updates daily outfit
`save_memory`	Writes to character diary (memo.md)
`update_evaluation`	Updates relationship scores (trust, affection, etc.)

Image Generation Pipeline

Tool call → Load daily outfit → Check makeup keywords
    │                              │
    │                    ┌─────────┴─────────┐
    │                    │ No makeup          │ Makeup detected
    │                    ▼                    ▼
    │              base_image.png      base_image_serious.png
    │                    │                    │
    └────────────────────┴────────────────────┘
                         │
                    Build SD prompt
                         │
                    Send to RunPod (SDXL img2img)
                         │
                    Return image to Telegram

Deep dive: Dynamic Character Visuals

Layer 5: Post-Processing

After the LLM generates a response, several cleanup and quality steps run:

DS3.2 Artifact Cleanup

Fix tokenization glitches (社long → 社長)
Truncate repetition loops (same phrase 3+ times)
Extract plain-text tool calls (DS3.2 sometimes outputs them as text)

Quality Rewrite Pipeline

DS3.2 draft → Claude Haiku rewrite → censorship check
    └── If censored: discard rewrite, use original
    └── If circuit breaker active: skip rewrite entirely

Anti-Repetition

Track opening patterns of last 3 responses
If same pattern repeats: inject variation warning

Tone Confusion Detection

Check if character is using another character’s speech patterns
Especially important in multi-character systems

Layer 6: Context Management

Tiered Memory Architecture

┌─────────────────────────────────────────┐
│ Tier 1: Raw Messages (last 15)          │  ~2,000 tokens
│ Full message text, verbatim             │  Updates: every message
├─────────────────────────────────────────┤
│ Tier 2: Chat Summary                    │  ~500 tokens
│ Compressed narrative of older messages  │  Updates: periodically
├─────────────────────────────────────────┤
│ Tier 3: Character Diary (memo.md)       │  ~400 tokens (last 50 lines)
│ Character's own notes and memories      │  Updates: per save_memory call
├─────────────────────────────────────────┤
│ Tier 4: Evaluation Scores               │  ~200 tokens
│ Relationship state + impressions        │  Updates: per update_evaluation
└─────────────────────────────────────────┘

Compression Pipeline

Old messages → summarized into flowing narrative (chat_summary.md)
Chat summary grows → compressed further
Character diary grows → compressed, deduplicated

Result: The character can reference events from days ago without consuming the entire context window.

Layer 7: Persistence

Database: SQLite with WAL (Write-Ahead Logging) mode

Table	Contents
`messages`	All conversation history (id, character, role, content, timestamp)
`api_calls`	LLM call metrics (model, tokens, cost, latency, fallback status)
`image_calls`	Image generation metrics (backend, cost)

Why SQLite?

Zero configuration
WAL mode handles concurrent reads during writes
Single file — trivial to backup
More than fast enough for a single-server bot

Branch System

The entire conversation state can be snapshotted and restored:

/branch save "before_confession"
→ [conversation continues]
/branch restore "before_confession"
→ [back to the saved state]

This enables story branching — users can explore different dialogue paths.

Layer 8: Subsystems

Lorebook Engine

JSON files with keyword-triggered entries
Features: sticky (persist N turns), cooldown, delay, probability, selective matching
Affinity-gated entries (NSFW world info blocked at low trust)

Wardrobe System

Outfit database with categories (tops, bottoms, accessories, makeup)
Per-character exclusive items
Makeup detection triggers base image switching

Emotion Detection

Regex-based emotional state extraction from response text
Mapped to expression sprites (happy, sad, angry, surprised, etc.)
Sprites sent before text response

NPC System

Gemini 2.5 Flash generates NPC concepts
DeepSeek writes NPC dialogue
Gemini 2.0 Flash rewrites for polish

GM Console

45+ development tools for maintenance, analysis, character creation
Safe delegation to Claude via file-level SFW/NSFW separation
CLI interface with interactive and single-shot modes

Deep dive: Using Claude for NSFW Bot Development

Cost Structure

Component	Monthly Cost	%
DeepSeek V3.2 (primary chat)	$15–25	45%
Claude Haiku (quality rewrites)	$5–10	18%
Gemini + GLM-5 (NPCs, scenes)	$3–5	8%
RunPod (image generation)	$5–10	18%
VPS hosting	$5	10%
Total	$33–55

Deep dive: Running an AI Bot on $50/month

Key Design Principles

1. Dynamic Over Static

The system prompt is never the same twice. Every data source contributes to a context that reflects the current state of the world, the relationship, and the character’s internal life.

2. Graceful Degradation

Every model call has a fallback. Every fallback has a fallback. The user never sees an error — they get a slightly different quality response.

3. Right Model for the Right Job

Don’t fight model limitations. Use uncensored models for uncensored content, quality models for quality, cheap models for bulk work.

4. Characters Are Data, Not Code

Everything about a character — personality, rules, NSFW behavior, outfits, memories — lives in files, not code. Adding a new character is a YAML edit, not a code change.

5. The Relationship Is the Product

The affection system, milestone unlocks, and behavior gating create an experience that no flat chatbot can match. Users come back because the character remembers and evolves.

Build Your Own or Try Existing Platforms

This architecture took months to build. If you want to start simpler:

Candy AI — Closest to a “just works” all-in-one experience
JanitorAI — Bring your own API key for model flexibility
FantasyGF — Best for AI girlfriend with photo generation

If you want to build your own, start with our Telegram bot tutorial and layer in features from there.

This is the architecture reference for Suzune. Each subsystem has its own deep-dive article — follow the links above to explore. Follow @waifustack for new articles every week.