Skip to content
WaifuStack
Go back

Architecture of a Production NSFW RP Bot: The Complete System Map

This is the reference article. Every other article on WaifuStack covers a specific subsystem — this one shows how they all fit together.

If you’ve read our other articles, this is the map. If this is your first article, start here and follow the links to deep-dives.

Table of contents

Open Table of contents

The System at a Glance

┌──────────────────────────────────────────────────────────┐
│                    SUZUNE ARCHITECTURE                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Telegram ──→ Message Handler                            │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   System Prompt Assembly        │                     │
│  │   ├── Character Persona (YAML)  │                     │
│  │   ├── Speech Rules (Markdown)   │                     │
│  │   ├── Lorebook (keyword scan)   │                     │
│  │   ├── Evaluation Scores         │                     │
│  │   ├── Milestone Injections      │                     │
│  │   ├── Chat Summary              │                     │
│  │   ├── Character Diary (Memo)    │                     │
│  │   ├── Daily Outfit              │                     │
│  │   ├── Current DateTime          │                     │
│  │   └── Tone Reminder             │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   LLM Router                   │                     │
│  │   ├── NSFW Detection            │                     │
│  │   ├── Pre-emptive Routing       │                     │
│  │   ├── DeepSeek V3.2 (primary)   │                     │
│  │   ├── Claude Haiku (fallback)   │                     │
│  │   └── Censorship Detection      │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│                    ▼                                     │
│  ┌─────────────────────────────────┐                     │
│  │   Post-Processing               │                     │
│  │   ├── DS3.2 Artifact Cleanup    │                     │
│  │   ├── Quality Rewrite (Claude)  │                     │
│  │   ├── Anti-Repetition           │                     │
│  │   └── Tone Confusion Check      │                     │
│  └─────────────────────────────────┘                     │
│                    │                                     │
│          ┌────────┴────────┐                             │
│          ▼                 ▼                             │
│     Tool Calls        Text Response                     │
│     ├── Selfie         ├── Emotion Detection            │
│     ├── Outfit         ├── Message Splitting            │
│     ├── Memory         └── Telegram Delivery            │
│     └── Eval Update                                     │
│                                                          │
│  Persistence: SQLite (WAL) ──→ messages, metrics, state  │
│                                                          │
└──────────────────────────────────────────────────────────┘

Let’s walk through each layer.


Layer 1: Message Ingestion

Stack: Python 3.11 + aiogram v3 (async Telegram framework)

When a user sends a message:

  1. Receive via Telegram webhook or long-polling
  2. Timestamp the message in JST: [04/02 14:30]
  3. Extract metadata: inner voice (心の声: ...), command flags (/nsfw, /sfw)
  4. Save to SQLite database
  5. Start typing indicator (refreshed every 4 seconds)

Every message is timestamped before storage. This gives the LLM time awareness — it knows when the last message was sent and how much time has passed.

Deep dive: Building a Telegram Roleplay Bot from Scratch


Layer 2: System Prompt Assembly

This is the brain of the system. The system prompt is rebuilt from scratch on every message from 20+ dynamic data sources.

Data Sources (in injection order)

#SourceToken BudgetChanges Per
1Character persona (persona.md)~800Rarely
2Behavior rules (rules.md)~400Rarely
3Example dialogue~300Never
4Capability instructions~200–500Per capability
5Lorebook entries (keyword-triggered)~300–800Per message
6Relationship stage guidance~200Per score change
7Character diary (memo.md, last 50 lines)~400Per session
8Shared timeline~200Per event
9Chat summary (compressed history)~500Periodically
10Evaluation scores + impression~200Per update
11Image generation policy~50Per affinity change
12Story direction hints~100Per GM update
13GM directives~100Per operator command
14Milestone injections~100One-time per milestone
15Daily outfit~50Per outfit change
16Relationship reminder (recency fix)~100Every message
17Current datetime~20Every message
18Promise/plan reminder~50Every message
19Story initiative rule~50Every message
20Tone reminder (recency fix)~200Every message

Total system prompt: ~3,000–5,000 tokens depending on active lorebook entries and capabilities.

The tone rules and relationship reminder are injected twice — once in the persona section and once at the very end. This fights LLM recency bias in long conversations.

Deep dives:


Layer 3: LLM Router

The router decides which model handles the request and manages fallback chains.

Routing Logic

1. Check LLM profile: runtime override → character YAML → default
2. If primary is Claude AND context is NSFW → pre-emptive route to DeepSeek
3. Call primary model
4. If rate limited → fallback
5. If refusal detected → fallback
6. If empty response → retry without tools, then fallback
7. If silent sanitization → fallback

Model Assignments

ModelRoleCost/1M tokens
DeepSeek V3.2Primary chat$0.25 / $0.40
Claude Haiku 4.5Quality rewrite + SFW fallback$0.80 / $4.00
Gemini 2.5 FlashNPC direction$0.30 / $2.50
Gemini 2.0 FlashNPC rewrite$0.10 / $0.40
GLM-5Scene descriptions$0.80 / $2.56

Deep dives:


Layer 4: Tool Execution

The LLM can call tools during response generation. Tool calls are executed in a loop (max 5 iterations) until no more tools are requested.

ToolWhat It Does
generate_imageCreates a character selfie/scene image via RunPod
change_outfitSelects from wardrobe, updates daily outfit
save_memoryWrites to character diary (memo.md)
update_evaluationUpdates relationship scores (trust, affection, etc.)

Image Generation Pipeline

Tool call → Load daily outfit → Check makeup keywords
    │                              │
    │                    ┌─────────┴─────────┐
    │                    │ No makeup          │ Makeup detected
    │                    ▼                    ▼
    │              base_image.png      base_image_serious.png
    │                    │                    │
    └────────────────────┴────────────────────┘

                    Build SD prompt

                    Send to RunPod (SDXL img2img)

                    Return image to Telegram

Deep dive: Dynamic Character Visuals


Layer 5: Post-Processing

After the LLM generates a response, several cleanup and quality steps run:

DS3.2 Artifact Cleanup

Quality Rewrite Pipeline

DS3.2 draft → Claude Haiku rewrite → censorship check
    └── If censored: discard rewrite, use original
    └── If circuit breaker active: skip rewrite entirely

Anti-Repetition

Tone Confusion Detection


Layer 6: Context Management

Tiered Memory Architecture

┌─────────────────────────────────────────┐
│ Tier 1: Raw Messages (last 15)          │  ~2,000 tokens
│ Full message text, verbatim             │  Updates: every message
├─────────────────────────────────────────┤
│ Tier 2: Chat Summary                    │  ~500 tokens
│ Compressed narrative of older messages  │  Updates: periodically
├─────────────────────────────────────────┤
│ Tier 3: Character Diary (memo.md)       │  ~400 tokens (last 50 lines)
│ Character's own notes and memories      │  Updates: per save_memory call
├─────────────────────────────────────────┤
│ Tier 4: Evaluation Scores               │  ~200 tokens
│ Relationship state + impressions        │  Updates: per update_evaluation
└─────────────────────────────────────────┘

Compression Pipeline

Result: The character can reference events from days ago without consuming the entire context window.


Layer 7: Persistence

Database: SQLite with WAL (Write-Ahead Logging) mode

TableContents
messagesAll conversation history (id, character, role, content, timestamp)
api_callsLLM call metrics (model, tokens, cost, latency, fallback status)
image_callsImage generation metrics (backend, cost)

Why SQLite?

Branch System

The entire conversation state can be snapshotted and restored:

/branch save "before_confession"
→ [conversation continues]
/branch restore "before_confession"
→ [back to the saved state]

This enables story branching — users can explore different dialogue paths.


Layer 8: Subsystems

Lorebook Engine

Wardrobe System

Emotion Detection

NPC System

GM Console

Deep dive: Using Claude for NSFW Bot Development


Cost Structure

ComponentMonthly Cost%
DeepSeek V3.2 (primary chat)$15–2545%
Claude Haiku (quality rewrites)$5–1018%
Gemini + GLM-5 (NPCs, scenes)$3–58%
RunPod (image generation)$5–1018%
VPS hosting$510%
Total$33–55

Deep dive: Running an AI Bot on $50/month


Key Design Principles

1. Dynamic Over Static

The system prompt is never the same twice. Every data source contributes to a context that reflects the current state of the world, the relationship, and the character’s internal life.

2. Graceful Degradation

Every model call has a fallback. Every fallback has a fallback. The user never sees an error — they get a slightly different quality response.

3. Right Model for the Right Job

Don’t fight model limitations. Use uncensored models for uncensored content, quality models for quality, cheap models for bulk work.

4. Characters Are Data, Not Code

Everything about a character — personality, rules, NSFW behavior, outfits, memories — lives in files, not code. Adding a new character is a YAML edit, not a code change.

5. The Relationship Is the Product

The affection system, milestone unlocks, and behavior gating create an experience that no flat chatbot can match. Users come back because the character remembers and evolves.


Build Your Own or Try Existing Platforms

This architecture took months to build. If you want to start simpler:

If you want to build your own, start with our Telegram bot tutorial and layer in features from there.


This is the architecture reference for Suzune. Each subsystem has its own deep-dive article — follow the links above to explore. Follow @waifustack for new articles every week.


Share this post on:

Next Post
Long-Term Memory for AI Chatbots: How Suzune Remembers Yesterday