“Remember when you said you’d try that café?”
This is the moment that separates a chatbot from a character. Most AI systems forget everything the moment you close the window. Suzune’s characters remember — not just the words, but the emotional significance of what happened.
Here’s how we built it.
Table of contents
Open Table of contents
The Problem: Context Windows Are Small
Even large context windows (128K tokens) aren’t enough for a character that’s been chatting for weeks. At ~4 tokens per word, 128K tokens is about 32,000 words — roughly 2-3 days of active conversation.
Beyond that, you have two choices:
- Truncate: Delete old messages. The character forgets.
- Compress: Summarize old messages. The character remembers the important parts.
We chose compression. Here’s the architecture.
The Three-Tier Memory System
┌─────────────────────────────────────────────────┐
│ TIER 1: Raw Messages │
│ Last 15 messages, verbatim │
│ ~2,000 tokens | Full detail, exact words │
├─────────────────────────────────────────────────┤
│ TIER 2: Chat Summary │
│ Older messages compressed to narrative │
│ ~500 tokens | Key events, emotional beats │
├─────────────────────────────────────────────────┤
│ TIER 3: Character Diary (Memo) │
│ Character's own notes about the relationship │
│ ~400 tokens (last 50 lines) | Long-term memory │
└─────────────────────────────────────────────────┘
Each tier trades detail for longevity. Recent messages are verbatim; older history is progressively compressed.
Tier 1: Raw Messages
The last 15 messages are kept in full — exact text, exact formatting. This covers the immediate conversational context.
SUMMARY_RECENT = 15 # always send this many messages verbatim
async def get_recent_messages(character_id, limit=15):
async with db.execute(
"SELECT role, content FROM messages "
"WHERE character_id = ? ORDER BY id DESC LIMIT ?",
(character_id, limit)
) as cursor:
rows = await cursor.fetchall()
return list(reversed(rows))
Why 15?
It’s a balance between context quality and token budget. At ~130 tokens per message pair (user + assistant), 15 messages consume ~2,000 tokens. Combined with a ~3,000 token system prompt, we stay well within budget for the next generation call.
Timestamps
Every message is timestamped when stored:
[04/01 22:15] User: I found that café you mentioned
[04/01 22:15] Sakura: *eyes widen* Wait, really? The one near the station?
[04/01 22:16] User: Yeah, the matcha latte was actually good
[04/01 22:16] Sakura: *small smile* ...Told you.
Timestamps let the character perceive time passing. “You mentioned that café three days ago” is only possible if the model can see dates in the message history.
Tier 2: Chat Summary
When more than 20 messages have accumulated, older messages (beyond the recent 15) are compressed into a flowing narrative:
## Recent Conversation Summary
Yesterday evening, the conversation started playfully — he
teased her about her new glasses, and she pretended to be
annoyed but was clearly pleased. The mood shifted when he
mentioned a work trip next week. She tried to seem indifferent
but asked twice about the dates. Before signing off, she
mentioned wanting to try cooking something new this weekend —
possibly hinting at wanting to cook together.
The Compression Prompt
Summarize the following conversation between {character_name}
and the user. Focus on:
1. Key events and decisions
2. Emotional moments and shifts in mood
3. Promises, plans, or commitments made
4. Relationship dynamics (who initiated what)
Write in third person, past tense, as a flowing narrative.
Preserve emotional nuance — don't just list what happened,
capture HOW it felt.
The instruction to “capture HOW it felt” is crucial. A summary that says “they discussed dinner plans” loses the emotional context. “She hesitantly suggested dinner, and his enthusiastic response made her visibly relieved” preserves the character dynamics.
When Does Compression Run?
When message count exceeds SUMMARY_THRESHOLD = 20. The oldest messages (beyond the recent 15) are compressed and the summary replaces them in the context.
The summary file (chat_summary.md) is overwritten each time — it always represents the full compressed history up to the current recent window.
Tier 3: Character Diary (Memo)
The most unique layer. The character writes its own diary using the save_memory tool:
## 04/01 — memo entry
He remembered my birthday without being told. I didn't
expect that. It's getting harder to pretend I don't look
forward to talking to him.
## 04/02 — memo entry
He asked about my novel draft. Nobody asks about that.
I almost showed him the first chapter but chickened out.
Maybe next time.
How It Works
The character has access to a save_memory tool. When something significant happens — an emotional moment, a promise made, a secret shared — the character calls the tool to save a note.
def _tool_save_memory(self, content: str):
"""Save a diary entry to the character's memo file."""
timestamp = datetime.now().strftime("%m/%d")
entry = f"\n## {timestamp} — memo entry\n{content}\n"
memo_path = self.character.character_dir / "memo.md"
with open(memo_path, "a") as f:
f.write(entry)
The memo is then injected into the system prompt (last 50 lines) on every subsequent message. This means the character’s own reflections influence its future behavior.
Why Character-Written?
We tried externally generated summaries. They were accurate but sterile — they captured facts but not the character’s perspective.
When the character writes its own diary, the entries reflect the character’s personality:
- A tsundere writes: “He said something nice. Whatever. It’s not like I care.”
- An earnest character writes: “Today was special. I think we’re becoming real friends.”
This self-authored memory creates a feedback loop where the character’s past reflections shape its present personality.
Compression
When the memo grows too large (> 50 lines injected), it gets compressed:
Compress the following diary entries. Merge related entries,
remove redundant information, but preserve:
1. Key relationship milestones
2. Emotional turning points
3. Promises and commitments
4. The character's evolving feelings
Compression reduces the memo to its essential emotional beats while keeping the character’s voice.
How It All Connects
On every message, the system prompt includes:
[System prompt sections 1-5: persona, rules, lorebook, etc.]
## Diary
[Last 50 lines of memo.md]
## Recent Conversation Summary
[Compressed narrative of older messages]
[System prompt sections 10+: evaluation, outfit, datetime, etc.]
--- Raw message history (last 15 messages) ---
The character sees:
- Its own diary entries (long-term emotional memory)
- A narrative summary of recent conversation (medium-term context)
- The last 15 messages verbatim (short-term detail)
Together, these create the illusion of continuous memory — the character “remembers” things from days ago even though the actual context window only holds a few thousand tokens.
Practical Impact
Without Memory System
User: Remember that thing you said about the café?
Bot: I'm not sure what you're referring to. Could you remind me?
With Memory System
User: Remember that thing you said about the café?
Sakura: *perks up* Tsuki no Shizuku? The one with the matcha
latte? Yeah, I said you should try it. *narrows eyes* ...Wait,
did you actually go? Without me?
The character pulls from:
- Lorebook entry (café name: “Tsuki no Shizuku”)
- Chat summary (previous conversation about the café)
- Character diary (she wanted to go together)
All three tiers contribute to one coherent response.
Database Schema
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
character_id TEXT NOT NULL,
role TEXT NOT NULL, -- 'user', 'assistant', 'system'
content TEXT NOT NULL,
tool_calls TEXT, -- JSON array if present
tool_call_id TEXT, -- for tool result messages
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- WAL mode for concurrent read/write
PRAGMA journal_mode=WAL;
Messages are never deleted — only compressed into summaries. The raw history stays in the database as an audit trail and for the branch/restore system.
Branch System: Story Branching
Because all state is persisted, we can snapshot and restore the entire conversation state:
/branch save "before_confession"
→ [Database snapshot saved]
... conversation continues, things go badly ...
/branch restore "before_confession"
→ [State restored to the saved point]
This lets users explore different narrative paths — a confession that goes wrong can be rewound and tried differently.
Lessons Learned
1. Emotional Context > Factual Context
“They discussed the movie” is less useful than “She got unexpectedly emotional talking about the movie’s ending.” Compression prompts should prioritize emotional significance.
2. Character-Authored Memory Is Worth the Complexity
External summaries are easier to implement, but character diary entries create a feedback loop that makes the character feel genuinely self-aware.
3. 50 Lines Is the Sweet Spot for Memo Injection
Too few (20) and the character loses long-term context. Too many (100+) and the memo dominates the system prompt, pushing out other important context.
4. Timestamps Are Non-Negotiable
Without timestamps, the character can’t distinguish “this morning” from “three days ago.” Always timestamp messages and diary entries.
Try It Without Building
If you want AI chat with memory but don’t want to build a system:
- Candy AI — Has basic relationship memory
- Kupid AI — Characters with personality development
- Kindroid — Notable for strong memory features
For the full platform comparison: Best NSFW AI Chatbot Platforms 2026
For the overall architecture, see Architecture of a Production NSFW RP Bot. For prompt engineering, see Prompt Engineering for Immersive Roleplay.