Virtual Game Master (VGM) postmortem

VGM Logo

I've been running a VGM (Virtual Game Master) service for a while, and I want to share some insights and reflections on the experience.

A year ago, me and my friend were thinking of a 'text-based dnd-like experience' for those, who has either very little time to attend IRL TTRPG events, or no friends (>__>) to play with.

I said:

Hold my San Miguel! (beer)

It's April 2025 and AI Era is upon us - it'll work beautifully

Spoiler: it kinda did.

But first

We explored other options available, because, obviously, we are not the ones who decided to make AI Dungeon Master.

All of them sucked, in one way or another. But ultimately, it's Overly Aggressive Monetization + Cheapest Possible LLM (paired with lousy system instructions).

Some results were particularly hilarious, like the one where the too trusting AI DM was acting surprised when in Harry Potter-ish quest, I first put on my Emerald Jetpack, and then gave some NPC a million dollars straight from my pocket, and flew out of the window to Narnia.

my article about VGM on Medium

so I started building my own game master, hoping it would..

Suck less

PWA Screenshot

What makes a Table-top Role Playing Game an enjoyable experience:

A captivating story and the chance to be someone you cannot be IRL.
Other players to interact with, scheme against, or save from their own bad ideas.
Written rules making fantasy adventures in unknown lands of magic more stable and predictable than the actual world outside your window.
A Dungeon Master's experience and charisma on top of those rules — to say "yes, but" instead of just "no".

Am I making a system that could induce a massive Dungeon Masters layoffs in boardgaming clubs around the world, rendering human component obsolete?

No. So TTRPG Purists and AI haters should STFU, because the result was a totally different experience from regular games.

A few observed cases:

Regular normal everyday solo adventure to do instead of doomscrolling.
Bizarre experimental chaotic neutral multiplayer run that no human DM could endure and remain sane.
Inspiration well to draw from and use in regular human-only adventures.
Second life to return to for many days — see below, this one deserves its own section.

The architecture (delta edition)

Club Deviant

I won't repeat what I covered in the previous article. The stack moved a fair bit since (Gemini 2.5, Imagen4 alongside FLUX.1-schnell, vector embeddings switched to gemini-embedding-001 at 3072 dimensions, ASR via Groq-hosted Whisper3 Turbo), but the concepts are more interesting than the exact bills of material.

A few things worth flagging that arrived after the first article:

HyDE for retrieval. Naïve vector search worked, but predictably picked the most generic neighbours from the references DB. Switching to Hypothetical Document Embeddings — generate a fake plausible passage in the target style, then search by its embedding — produced way more characterful NPC, item and location pulls. The system goes "I think the answer would look like this", embeds this, and lets the vector DB fight back with the closest real entries.

A "preface" to preheat the world. Before the player ever types anything, the Story AI writes a short setting preface that the rest of the pipeline reads. This sounds tiny but it dramatically reduced the GM's tendency to drift towards generic Tolkien-default-fantasy in turn one.

Thought signatures and tool calls in the DB. Long sessions used to lose continuity once the history rolled over. Persisting the GM's reasoning traces and tool calls (not just user/assistant text) gave the next-turn agent something better to reattach to than a stack of dialogue lines.

A multiplayer agent on the bigger model. Co-op was the single biggest jump in complexity — the GM has to address the party, juggle interleaved actions, keep four characters in scope. I bumped multiplayer specifically to gemini-2.5-pro and ate the cost. Obvious quality improvement, obvious budget hit, both expected.

A Loot tool. Items used to be hand-waved by the narrator — and per the design pillars (rules system fixed, randomness externalized), that's a sin. So loot became a tool call: input the situation, output a structured item, narrator describes it. Same shape as dice rolls. Artifacts got their own variant, with one given to the PC at session start so the inventory has something interesting from minute one.

A 300-message history window. Today I learned that a 1400-message session tops out the 1M-token input limit. So now history gets summarized aggressively. The AI may forget that you got blackout-drunk in the tavern two in-game weeks ago, but a human DM would too. Probably.

The agent cast itself is the same as before — Adventure Architect, Character Concept Assistant, GM/Narrator, Visual Describer, Translator, Summarizer. Each has a different objective function (consistency, creativity, fairness, faithfulness), which is the whole reason they're separate. Trying to make a single prompt good at all of those at once is how you end up with the previous generation of AI DMs that get talked into giving you an Emerald Jetpack.

Total random vs contextual integrity

Randomized Adventure

This was the hardest balance.

The whole point of VGM is to surprise the player. Genre is rolled, prefixes are rolled (Grimdark Sexy Post-Cyberpunk is a real label that came out of the box one evening), the reference work is sampled from a vector hit list, NPC names are pulled from a setting-aligned shortlist. Vector search was used to populate the game world with semantically aligned NPCs, items and locations — so a High Fantasy world has less probability of laser rifles and neural implants but never zero.

That part works. The harder part is that the underlying model has opinions.

Gemini, it turns out, has favourite words and favourite names. "Smell of ozone" gets described in approximately every cyberpunk scene. "Oakhaven Village" was every second rural setting until I forced name lookups through the vector DB. And then there is Elara.

Elara is Gemini's favourite name. She showed up in roughly every second adventure as either a quest-giver, a love interest, an antagonist, or all three at once. I kept fighting it. v0.3.1 was literally titled "Elara's retirement" — I added a name_search_tool and forced the GM to consult the vector DB instead of inventing names from the void.

She came back anyway.

Elara

At some point I gave up and made her the project mascot. Generated a bunch of Stable Diffusion portraits of "Not An Actual Elara", attached them to changelog posts. If the model insists, you might as well lean in.

Sometimes the contamination runs the other way — the model picks up the player's vocabulary and refuses to drop it. A friend played a session committed to doing everything cunningly. By turn ten the GM was producing lines like:

"...you cunningly pick the cunning lock and cunningly open the cunning door, only to find a cunning ogre behind it, cunningly waving a huge cunning wooden club."

Contagious adjectives. The model caught his vibe and would not let go.

The guardrails problem

A more uncomfortable version of "the model has opinions" is the model has guardrails, and sometimes they fire mid-game.

Players occasionally tried things the safety layer was not going to allow — most often: "I shoot myself". The AI refused. A couple of players were genuinely frustrated; one or two felt it broke immersion.

I'm on the AI's side here, and I think most experienced human DMs would be too. I ran DnD 4e and Pathfinder tables for years before this project, and "no, your character does not in fact pull the trigger on themselves to skip the boss fight" is a perfectly reasonable thing for a DM to say. The fiction is the game; ending it abruptly because the rules technically allow it is not a feature.

The frustrating part is that I cannot always tell, from the logs, whether a refusal was the model genuinely doing the right thing, or the model being prudish about something it didn't need to be. That is a tuning problem I never fully solved.

The 1400-turn player

This is the part of the postmortem I keep coming back to.

One player ran a session past 1400 messages. The actual quest finished a long time before that. They then chose to just keep living a normal life inside the game world — wife, kids, day-to-day stuff, the whole sitcom. The PC's wife, by the way, was named Elara. Of course she was.

The system was never designed for this. There is no "domestic life" mode; the GM was just continuing because the player kept writing. Summarization eventually swallowed most of the original quest details — at some point the AI definitely could not have told you what the opening hook was. The player did not seem to mind.

I don't know what to do with this story. It is the most surprising emergent thing the project produced, and possibly the saddest. For one person, somewhere, that second life mattered enough to come back to for many days. The project closing means that save file is gone too.

Why it's closed

I quietly turned the Telegram bot off in December 2025 and stopped shipping releases. A few reasons compounded.

Access. Most of my friends are still in Russia. The censorship pressure there has been climbing steadily; Telegram is now fully inaccessible without a VPN, and the project's web app at vgm.lol is likely block-listed too — probably just in case, since "AI" + "foreign-hosted" + "user-generated content" ticks every box on the "forbid this" form. So the people I most wanted to play it had to hop a VPN to even reach the front page.

The social aspect. I tried showing it to my newer friends in the Philippines. The response was uniformly "wow, cool project, I'll check it out", followed by never actually checking it out. Posts on social media went unnoticed; mentioning AI in the description seemed to reduce visibility rather than help — like the algorithm has been trained to suppress that exact word in self-published indie posts.

Expenses. Inference + infra was about twenty bucks a month. Not a lot.

Twenty bucks is twenty bucks.

Especially when you're the only person playing, and I never seriously thought about how to make the project profitable in the first place.

Burnout. The previous three points converge here. I was the last remaining active user. Maintaining a multi-agent pipeline plus Flutter app, FastAPI backend, and vector DB stack for an audience of one is, eventually, just unpaid work.

What I actually got out of it

A list, in no particular order:

The most advanced Flutter app I have ever built. (I am, at heart, a backend person; this project dragged me through enough mobile UI to actually be useful at it.)
A working multi-agent design that survived being re-implemented twice (n8n → FastAPI → Pillbug, see below).
Concrete confirmation that the role split is the real load-bearing idea. Every time the system got worse, it was because I had asked one agent to do two things.
Evidence that vector search + HyDE meaningfully beats naïve retrieval for flavor tasks, not just factual ones.
A pile of player logs that I will probably keep on a hard drive forever.

And the lessons:

Social connections are the limiting reagent for indie project success. You can build the thing, but if your network won't actually try it, it doesn't matter how good it is.

You cannot control everything. Forces of nature, idiotic government censorship, and the social media algorithm's allergy to the word "AI" are all outside your repo.

Pillbug, or: an undead VGM

I have been working on Pillbug — an async AI agents framework, my own take on the problem, written for projects shaped exactly like VGM. As an experiment, I recently re-implemented the core VGM loop on top of it.

It works.

It needs a lot of polishing — the framework still has rough edges where I haven't decided which abstraction wins. But the fact that the entire VGM design ports cleanly to a different runtime is, in retrospect, the strongest validation of the original architecture I could have asked for. The role split, the data contracts between agents, the externalized randomness — they all survived the move.

So VGM is closed, but not necessarily dead. Probably. We'll see.

In the meantime: thank you to the handful of people who actually played it. Especially the one with the 1400-turn save.

I hope your in-game family is doing well.

Atlas helping