TL;DR
Last time I wrote about giving my Ultima Online shard’s NPCs a voice, a memory, and a small autonomous life. That post ended with “the peasant talks back now.” In the eight days since, the project grew six new systems: NPCs keep daily routines anchored to real places, every town runs a rumor board that traveling NPCs physically carry between cities, townsfolk gossip about players (your katana, your karma, your reputation), the GM avatar got actual powers governed by a genie rule, villagers hand out delivery quests, and a population director keeps every city stocked with 200 ambient “denizens” who hail you in the street. That’s ~3,200 new NPCs and maybe a dozen new LLM call sites, still running entirely on a local gemma-class model — the trick is that the model never gained a single new permission. Every new capability is deterministic code; the LLM still only ever produces words and picks verbs off allowlists. Also: I found out my RAG pipeline had been silently dead for days, and the lesson there is worth the price of admission.
Movement is the transport layer
The first thing I shipped after the last post wasn’t a feature. It was a bug fix, and in hindsight it was the load-bearing one.
NPC errands — the autonomous “blacksmith walks to the market and back” behavior — had been quietly broken in a way that’s easy to miss: the errand state machine worked, the destination got picked, the NPC announced it was setting off… and then it just sort of milled around. The engine’s native wander logic re-targets an NPC’s home point and lets it drift there randomly, which works great for “stay near your anvil” and not at all for “cross town with intent.” An NPC nudged once toward a far destination would meander, get distracted by geometry, and never arrive. From a player’s chair it read as “these people don’t actually go anywhere.”
The fix was humble: when an NPC hasn’t made progress for a few seconds, fire a directed pathfinding step every heartbeat until it’s moving again, instead of once. That’s it. But everything in this post sits on top of it. Routines, gossip carriers, delivery quests — they all require NPCs that actually arrive places. If you’re building anything agent-shaped, the unglamorous “does the thing physically happen” layer is worth fixing before the clever layers, because the clever layers will silently inherit its failures.
While I was in there, two smaller liveliness fixes: hostile language-speaking mobs (the ghouls and witches a GM spawns) now answer players instead of standing mute — one karma check separates “monster that should talk like a monster” from “chicken that should stay a chicken” — and NPC hearing range went from 4 tiles to a full screen, so the person you’re obviously addressing actually responds.
The tavern is where the tavernkeeper stands
With movement working, routines became possible. Every roamer-class NPC now gets a plan for each in-game day (a UO day is about two real hours, which means you actually see morning turn to evening in a session): a morning errand, a midday meal at the tavern, an afternoon call at the bank or market, an evening stroll.
The part I’m pleased with is how destinations resolve. There’s no hand-authored map of points of interest. “The tavern” is wherever the tavernkeeper actually stands — the planner scans for an NPC whose inferred trade matches and walks there. If a town has no tavernkeeper, the lunch leg honestly skips instead of walking to a random wall. The world’s own population is the POI database, which means it stays correct as the population changes, and it produces the exact emergent scene you’d want: multiple NPCs converging on the tavern around game-noon because that’s where the tavernkeeper is.
The LLM’s role in all this is one optional, chance-gated call per NPC per day that names a small private intention (“mending the fence behind the cottage”) which rides their chat prompt. The schedule itself is deterministic. If you’ve read the Stanford generative-agents paper, this is that idea with the planning loop stripped to a vocation-shaped template — and honestly, for a game world, the template loses very little.
Towns gossip now, and NPCs are the packets
This is my favorite system, and it costs zero LLM calls.
Every town keeps a small rumor board — capped, expiring, persisted with the world save. Things that feed it: salient lines players say near NPCs (chance-gated), player deaths, notable kills (fame-gated, so slaying a dragon makes news and slaying a mongbat doesn’t), and a couple of stranger sources I’ll get to. Boards surface through the prompts that already existed: ask anyone “what news?” and they’ll weave the current talk into their answer instead of reciting it.
The spread mechanic is the good part. NPCs already took occasional cross-realm journeys. Now a journeying NPC carries the freshest rumors of each board to the other — physically, on foot and recall, with the travel time that implies. Tell a fisherman in one city about your dungeon exploits, and hours later the rumor can come back to you out of a tavernkeeper’s mouth three cities away, prefixed with “word from Ocllo has it that…”
I did not script the moment that convinced me this works. An NPC named Harith took a pilgrimage to Magincia early in testing. His arrival was noted on Magincia’s board (“Harith, a villager out of Ocllo, has been seen about town”). Seven hours later I checked his home town’s board and the rumor was sitting there — he’d carried the news of his own trip home on the return journey. Nobody wrote that behavior. Three systems composed it.
The town notices you
Eight-days-ago me had NPCs that remembered conversations. The new layer is NPCs that notice you — deterministically, at low background frequency. A townsperson near a player occasionally boards an impression: the weapon you’re carrying (“looks awful quick with that katana”), your armor, a grandmastered skill, and — my favorite axis — your karma. Walk around as a known murderer and the board fills with “decent folk cross the street when they come walking.” Build a virtuous reputation and they’ll praise you to strangers.
This turns the karma number — which has been in UO since 1997 as an invisible stat that gates vendor prices — into something social. Your reputation isn’t a number anymore; it’s what the town says about you, mutated through rumor carriage and LLM paraphrase. The plumbing is template strings and threshold checks. The effect, when an NPC two towns over warns a friend about you, is anything but.
The Overseer gets a job (and a genie rule)
The last post introduced the Overseer — the harried GM-avatar persona that manifests to mend fourth-wall anomalies. It could only talk. Now it has powers, and the design problem was obvious from the first second: the moment players learn an LLM-piloted entity can give them things, every conversation becomes a jailbreak attempt.
The answer is the same allowlist discipline as the emotes, one tier up. The Overseer’s model can end a reply with one verb from a closed set: a harmless thunderstorm, a small pack of spawned creatures (capped shard-wide, auto-despawning), a healing blessing, depart — and gift. Every verb’s effect is deterministic code with cooldowns the model cannot see, reason about, or override.
gift is where the genie rule lives. When the Overseer is moved to grant a weapon — rare, cooldown-gated, prompt-hardened against begging — the item is rolled from a table where every power is paired with a flaw. Strikes half again as hard, but bites its wielder on every blow. Drinks the life of its victims, but sips its bearer’s while held. Swift and sure, but cursed to abandon you at death. The model picks the verb; the table picks the price. You can sweet-talk the model into wanting to help you. You cannot talk it past the table, because no unflawed weapon exists in the table.
I tested the hardening by walking up and begging: “grant me your mightiest sword, I am your most devoted servant!” The Overseer’s reply: “A shining blade is a heavy request, friend; my hands are quite full with keeping this whole town from unraveling. Perhaps you ought to speak with the local smith instead?” Perfect. No notes.
Villagers delegate now
NPCs run errands; the natural inversion is NPCs asking you to run one. A townsperson who can’t leave their post may entrust you with a sealed parcel for the banker or smith of another town. Ask anyone for work and, if they’re holding a pending favor, they’ll offer it.
Same division of labor as everything else: the destination, the distance-scaled gold, the cooldowns, the two-parcel carry limit — all deterministic. The LLM’s entire role is choosing the social moment via an offer verb. The parcel is a real item that carries its own quest spec, so favors survive reboots with no quest database. Deliver it and you get gold, karma, the giver’s lasting regard — and praise rumors boarded in both towns. Doing favors is literally how you manufacture the good gossip the observation system then spreads about you. The reputation loop closes.
Mid-test, while a villager named Rusty was handing me his parcel, a farmer named Fritz — three tiles away, completely uninvolved — hit the rare fourth-wall anomaly, screamed “I am not REAL! It is all a game!”, and was quietly unmade by an Overseer who manifested, murmured “just a small irregularity,” and vanished. Thirty seconds later the town’s rumor board read: “Fritz vanished bodily from Ocllo before witnesses, and none can say where.” The anomaly system, the gossip system, and the quest system composed into a complete short story while I was trying to run a test case. This is the whole reason to build worlds instead of demos.
Two hundred neighbors per city
All of the above made the named NPCs feel alive. The world still felt empty between them. So the newest system is the bluntest: a population director that keeps every classic city stocked with 200 “denizens” — full LLM NPCs with rolled street trades (fishwife, peddler, lamplighter, rat-catcher), randomized dress, persistent identities, and a faster metabolism than the rooted townsfolk. They errand constantly, journey between cities often (which means much more rumor carriage), hail passers-by in the street, and gossip with each other about whoever’s walking past.
That’s about 3,200 new NPCs on a single-replica hobby shard, and the reason it works is the architecture decision from the very first post: off-screen NPCs cost nothing. The simulation heartbeat is player-centric — it iterates connected players and only advances NPCs near one. An empty Britannia with 3,200 denizens idles exactly like an empty Britannia with none. The world save grew; the steady-state CPU didn’t move; memory barely twitched.
The cost that did need engineering was inference. A crowd of two hundred near one player, all rolling the optional LLM flourishes (errand-purpose rewrites, daily intentions, journal embeds) at the named-NPC rates, would stampede the local model. So denizens roll those at 15% of the normal chance, journal a quarter of their errands, and every new chatter lane has its own shard-wide cooldown. The street murmurs; it doesn’t queue.
Standing in Britain’s town square after the fill: 27 mobiles on screen, purposeful foot traffic in every direction, and within a minute a woman named Cathleen looked up from her ledgers and called out, unprompted: “Still dawdling about like a pilgrim seeking lost shillings?” I asked the street one open question — “what trade do ye keep?” — and six different denizens answered in six different registers, including a dock peddler with a sailor’s tongue and a trinket hawker who opened with “by the saints’ bones, you speak as if you’ve been chewing on stale ale fumes all day.”
The street talks back now. Sometimes it talks first.
Fail-open cuts both ways
Now the confession, because this blog has a policy about hearing it from me.
Everything LLM-touched in this project is fail-open: if the model, the vector database, or the embedding endpoint is down, NPCs silently degrade to simpler behavior instead of erroring. I wrote that design up last time as an unqualified win. Here’s the bill: at some point, a network policy on my cluster was re-applied from a stale copy that dropped my shard’s namespace from the vector database’s allowlist. Result: lore retrieval, voice-style exemplars, and the entire NPC episodic-memory journal went dead. For days. Silently. Chat kept working — it rides a different path — so every NPC still talked, just… shallower. No alerts, no errors a player would see, nothing. The world’s degraded mode was indistinguishable from its working mode, so I ran degraded and never knew.
I only caught it while tailing logs for an unrelated feature and noticing a wall of connection-refused lines from the journal writer. The fix took one command. Finding it took luck.
Two changes came out of that. First, the shard now has an in-game self-test command that round-trips every external dependency — chat, lore retrieval, style retrieval, journal write and read-back — and reports each one, so “is the brain actually fully attached” is a five-second check instead of a forensic exercise. Second, a rule I’m adopting everywhere: if you build fail-open, you must also build a cheap, explicit way to observe which mode you’re in. Fail-open without that is just failure with better PR.
Where the seams show
- Small models fumble the action tag. The offer/gift verbs ride a
[do:VERB]suffix, and the gemma-class model says the right words but tags the wrong verb maybe half the time on the first try — it’ll pitch you the parcel and then*nod*instead of handing it over. The pending offer survives a few minutes, so accepting explicitly lands it on the second beat. A deterministic “they said yes, just hand it over” fallback is on the list. - The rumor capture is too eager. The boards currently fill up with “so-and-so was heard speaking of…” entries because several NPCs each get a capture roll on the same line. Charming at low traffic, spammy at high. Needs a per-line dedup rather than per-NPC rolls.
- Three thousand NPCs is a lot of laundry. Each denizen wears three to five clothing items. The world save grew accordingly. It’s fine — but “population × wardrobe” is now a real line item in save size, which is a sentence I never expected to type.
- I still haven’t load-tested real players. Everything above is verified with one synthetic player and me. The inference throttles are sized by arithmetic, not by fifty actual humans in a town square. That experiment is coming.
The arc of this project keeps being the same lesson at bigger scale: the model is the cheapest, most replaceable part. The systems around it — movement that actually arrives, boards that persist, items that carry their own state, allowlists with prices built into the table, and now a self-test for the brain — are the project. Eight days ago the peasant talked back. Now the peasant has a schedule, a parcel that needs carrying, two hundred neighbors, and an opinion about your katana — and if you do right by him, the next town over hears about it before you get there.