The Bigger Better Chronicles NPC Behavior System
I wrote about Chronicles` action system ~sigh~ over three years ago.
With this system, every ability in the game could be defined in-asset and then generically compiled into an easily-reproducible set of game state changes which made simulation and preview trivial.
I then spent months figuring out how to get my NPCs to use them smartly and eventually emerged with a functional behavior planner that has over the intervening years grown into a true cornerstone of the game I'm building.
Through this system, NPC's can plan inventive turn decisions completely dynamically with whatever stats and abilities they have access to. As it has matured, it's become clear that this is a major pull for the game because the enemy units are consistently killingsurprising me with their actions.
It's also been a bugbear and constant pain to keep running and scaling as the game's content has grown and so I blacked out on nyquil for two weeks and rewrote it. Let's talk about it.
Part 1. What Are We Doing Again?
The ultimate end goal for the enemies in Chronicles is for them to kill you very effectively. This is of course their goal, but also, like, my goal for them ♥
A difficulty I faced had to do with all of the abilities in the game consisting of lists of arbitrary actions and target requests such that you cannot, programmatically, understand what an ability “does” or what it's “for.”
So we want our NPCs to take their health and stamina and equipment that they have, analyze their place in the world with what abilities they have available to them, and decide what they should do on their turn that maximizes their ability to kill you very effectively.
But since they can't know from staring at them what any of their abilities actually doooo, the only way to know if a wait is better than moving left or better than throwing the barrel on their right into the doorway behind the player is to fully simulate those actions in a virtual copy of the game state and then analyze the result state to see if they did good.
How the Old System Worked
I'm sometimes asked if my game's behavior is “expert systems” or “goap” or whatever other trendy behavior model people have heard of and I rarely know a succinct concise way to respond. At the risk of being overly reductive, I could say
I fully simulate every possible combination of decisions across 3 turns, score the resulting game states, and pick the best path.
If that sounds brute force and simple, it kind of is! The fact that I can arbitrarily copy my game state thousands of times performantly allows me to just bulldoze into the best solutions. It's heavily optimized and uses the magic 'fast-for-free' feature called C++ so even the Emscripten web build has little trouble discovering the right answers.
It works exceptionally well because of how varied the game gets. Hundreds or thousands of unique equipment and abilities still just plug into this system because they're just action lists at the end of the day and the resulting game state is what is being scored.
What game states are “best” is determined by a series of weights for different things I want them to pursue or avoid. Damaging your primary target is worth points, damaging health is worth more than damaging stamina, using a cooldown costs points, being far away from your target costs points, etc. I can tune these weights with overrides for different actors to make archers maintain distance or for certain fun enemies to ignore friendly fire, etc.
The simulation branches on every turn into a shared leaf-node pool and just continues to pop the next highest node and simulate the next turn off of it to build more nodes. Shockingly complicated actors can just evaluate 1-2 thousand end-nodes, often-times exhausting the graph entirely, and give us the best answer. If I wanted to stop iterating early, most likely due to a performance budget (more on that later) then this becomes I guess you might call a “Best-First Search of Simulated End-States” where hopefully we've explored a good answer by the time we bail out.
The Problems
In general it's worked really well and when I add a new enemy with a new ability and it just starts using it how I imagine they should it all feels really worth it. I'm sometimes really surprised by their tactics and find it challenging and fun to deal with.
Most of the difficulty of using this system over the last couple years has been a death by a thousand cuts of tweaking weights. You see a skeleton doing something silly, you dig into how they came up with that, make a change to the scoring to fix it, and then 2 months later you find out an encounter somewhere else you haven't tried in a while started acting silly as a result.
It's a bit of a gordian knot of tugging and pulling and never feeling very confident that you didn't just break something else. It's at best annoying and at worst depressing. I often doomspiral over the possibility that the whole thing is eventually going to collapse under its own weight as I continue to add content and features to the game.
Performance
More recently, some new enemies I created that I've had designed for a while started really bringing the behavior calc to its knees, pausing and stuttering the framerate every turn struggling to get through all the simulations in time even in Release. I mentioned in the last section that the algorithm is capable of bailing out early with the best-scored node found up to that point but prior to this year I hadn't had a reason to ever cap it.
Consider some value I that represents how many nodes I can process in a turn before it starts affecting the framerate. A sufficiently high I would mean I could always exhaust the full 3-turn graph for any combination of states and still be performant on mediocre hardware. The faster I process nodes, the higher it goes, so I can increase I by optimizing the hell out of the iteration code to push it higher.
After a month or two of increasing I by some 40x, I came to a new roadblock existential to the implementation. I had split the node processing to run over multiple frames because no processing is taking place while the animations for the turn are happening. Unfortunately, some single nodes would suddenly spike the frame time all on their own. This is because the act of just simulating all the possible result states for a particularly complex actor was producing thousands of possible states for just one turn. The combinatorial explosion of considering all their abilities and all their multiple choices was taking too long. Since I couldn't break it up any more granularly than that I began to admit that my plan wasn't going to work long term.
Even for the case of setting an iteration cap and relying on best-first, if a single turn calculation in this system spikes hard like that the cap is worthless for making the calculation performant. Trying to chase I is useless when one big node is able to reduce it to 1.
Part 2. The Multi-Armed Bandit
I accidentally did the thing where I wrote way too much for too long but if you're still here following along, welcome! We're going to talk about some machine-learning data science problems because it's relevant to the new system but let me preface that no, the game's not using LLM's or Generative AI.
Hovering quietly just outside the all-encompassing sphere of LLM's and technofascism is the great data science work being done with reinforcement learning. It's an option to explore with your data if you're interested in solving real problems and generating useful insights that are real and align with reality. Neat! I am but a tiny fish in the sea of this discipline but what's relevant to us here is the Multi-Armed Bandit Problem.
MAB is this idea that you are standing in front of a bunch of slot machines at a casino and each one has some different unknown probability for payout. You pull an arm and it hits 7's and gives you money. For all you know that machine has a 100% reward rate. You pull a different machine and get nothing. 0%.
Over time, you can start to get better guesses at which arms have better probability by trying them over and over and recording the outcomes. If you could infinitely pull every slot arm you could eventually know the probability for the best slots within an acceptable error.
But the unfortunate problem here is that you have to put money in the machine to pull it. You have a finite amount of money which creates a literal budget for how many machine arms you can pull.
So what arms should you pull in what order to maximize the total rewards before you go broke? Do you try them all an even amount (exploration)? Or do you mostly focus on the ones that you feel are rewarding more often (exploitation)? Some mixture of the two? As you pull arms, you're updating your own model for what your expected reward of each arm is, and that's reinforcement learning!
Chess
This doesn't only apply to greasy gambling. Take a game of chess. There are 20 legal first moves for white to start the game. If you have no idea how to play chess, each of those 20 moves has an equal chance of moving you toward an eventual win. Being as your opponent has autonomy, the outcome of your decision can be effectively pseudorandom. As you play more games and get better at it, you are building a model in your head for not just what moves are more reliable at the start but what moves are better in what situations.
A tricky bit with chess is that determining for sure which of the 20 moves is most often the best requires playing out the whole game to see who won. After black moves you're now in one of 400 possible games, and the next time it's white's play you're in one of 200,000 possible games. Chess is extremely large so building out an exhaustive tree of possible game states can become computationally expensive.
Monte Carlo Tree Search
So let's pretend that the NPC actors in Chronicles are all pieces in a very strange game of chess. On their turn they have a set of possible moves, they want to pick the best one in their goal of effectively killing the player, but they also want to explore future turns such that they may be able to identify and execute a longer strategy against the player.
The Monte Carlo Tree Search algorithm is designed to tackle this sort of problem. It aims to smartly build the possibility tree by selectively expanding leaf nodes. Rather than brute-forcing the entire tree and walking it, it tries to decide which branches to explore further based on this overall multi-armed-bandit idea of weighing exploration against exploitation.
The basic flow of the algorithm is 4 steps:
- Selection: From our given node, select a suitable child node and advance into it
- Expansion: If we have unexplored potential children, expand one and advance into it
- Simulation: Simulate the game forward from here for a score
- Backpropagation: For every node in the chain leading from root to here, increment the fact that we visited them along with how well the simulation scored
Every time this function runs, it expands the graph a little bit and also modifies rolling averages on the predecessor nodes with how well their subtrees are doing. Every time a node is visited, it is able to expand its possible children, which may hurt or hinder the parent's mean score, which in-turn affects whether it's selected to be visited in future iterations.
The subtree branches that consistently produce garbage are quickly ignored, and the ones that have occasionally produced promising results are given more priority, and this is infinitely recursive all the way down to the final decisions of the final turn.
Given a set budget for number of searches, we can rely on this algorithm to do as little simulation as possible while still finding a really good answer.
True Monte Carlo does simulation using rollouts of often random possible outcomes. Chronicles is fully deterministic so our simulation is as well, but an RPG with a lot of RNG could use this same system.
The Chuckster
Let's go over the Chronicles enemy that was causing so much trouble with the old system: The Chuckster. The chuckster has 3 abilities:
Wait: Earn extra stamina and skip the turnMove/Attack: Move into an cardinally-adjacent tile, attacking if occupied. 4 possible target options: up, down, left, rightChuck: Throw a nearby actor to a target position, damaging the area around them. Two-Part targeting:- Select an actor cardinally or diagonally adjacent: 0-8 possible choices
- Select an empty tile between 2 and 9 tiles away via Manhattan distance (a diamond): ~80-120 choices
Theoretically, the total possible permutations for 3 turns of this character quickly reaches into the hundreds of millions. What we would like to do is be able search this tree a fixed number of times, expanding it as we go, such that by the end of the search budget we have done the fewest necessary expansions and found the best answer.
- Start by creating a root node for the state of the game right before it's chuckster's turn to act
- Root has 3 possible nodes to advance to as defined by the 3 abilities available
- Each ability node can expand into the possible choices that can be made.
Waithas 0,Move/Attackhas 0-4,Chuckhas 0-8 - For cases where an ability has multiple target requests their next potential child set is the next set of possible decisions, some 80-120 in this case
- Once a node results in a completed valid turn, it gets 1 child node for the start of the next turn from that state and begins all over again with 3 abilities available to expand into on turn 2
Monte Carlo's first 10-20 searches will quickly expand all 3 abilities, and quickly visit all 4 Move/Attack children, and at least a few of the leaf nodes for Chuck but we are headed toward a problem.
Maybe the best answer for our Chuckster is to throw the nearby boulder at the player. Well, we immediately run into an issue because we have to consider that the searches are selectively expanding child nodes based on how well the average subtree is behaving. If throwing the boulder at the player is the only good use of Chuck out of the hundreds of possible options, what if our first 5,10,20 visits to Chuck don't include the player's position in it's second target request? In this case, Chuck can whither on the vine because it would be too expensive to exhaust all possible options and Move/Attack is providing great stable improvement on strategic positioning and eventually damage inflicted turn after turn and that's where the algorithm will spend its visits.
For help with this we're going to visit the selection and expansion portions of our Monte Carlo.
UCT
I mentioned that Monte Carlo has a “Selection” phase where it chooses a suitable child node, but what does that mean? Is it just the child with the highest average score? Here's where we have an opportunity to implement the MAB strategy of mixing exploration and exploitation.
There's a ton of complicated math around this and it all goes completely over my head, but one of the accepted functions for selecting children in an optimized way is UCT: Upper Confidence Bounds applied to Trees. Simply put, this function mixes the number of parent visits against the node's visits with the actual mean subtree score and applies an 'exploration constant' (C) which leans the selection more toward either exploring new nodes or exploiting good nodes.
It's got a lot of logarithms and square roots and such but it plugs into an implementation of MCTS pretty simply. After experimenting with different C values, it does a great job of giving nodes a chance to prove themselves. Actors in Chronicles can easily be given modifiers to their exploration constant to modify how much of their thinking budget they spend exploring less-obvious paths.
UCT makes it possible for our Chuckster to get a lot of visits to the Chuck ability node even if it's not immediately providing value. In a sense, we're hoping that it is visited enough to find the one good throw that then moves a huge majority of future visits to exploring it.
Progressive Widening
Meanwhile, in the expansion phase, the simplest implementation will always expand one child every time the parent is visited. For cases with very large possibility sets however, this can start to swamp the children set. Progressive widening is the idea of using some constant to limit how often a child is expanded from the possible set.
It's based on visit counts, such that 0-10 visits may produce the first 10 children, but it will start to take more visits, 20-30, to get the next 10 children.
This kind of works against our chuckster because it's now a little harder to mine for the good throws, but! In the case that the good throw is found early, this will prevent a lot of wasted children to be produced as we start visiting the good throw hundreds of times to look for good 2nd turns!
Now if only we could
Try to Find the Good Throw Earlier
Given gestures at everything in the enormous blog post we can't really know at an action level if one decision is better or worse than another. If I have a tile-pick target request with 150 possible 2D positions to choose from, I can't reliably say 'try these first` with any certainty because, well, that's the point of all of this.
But we could maybe do some zhuzhing to try and help it out. Simply put, the parent node has a single list of 2D coordinates and when it expands a child it grabs the next one and makes a child node. So what we could do is order the possibility list when the parent gets created and try help it hit better hits earlier in expansion.
First things first is just taking the location of our primary target and popping it right to the top, or the closest possible tile to our target's position. Let's face it, 99% of the work we're trying to is to do what? effectively kill the player and it's a safe bet that if our first child is aimed straight at our kill target we're going to generate some value.
But that's not always the case, there's support abilities, healing, escape mechanics, distancing, all sorts of reasons not to try and aim at the player, and that's fine, we're probably going to be getting at least a few early visits and that's just the first, after that we want to order the remaining positions in somewhat of an evenly-distributed order.
It's not really helpful to grab 10 children from a set of 200 if those 10 are all clumped in the upper left because it's some positionally-ordered list, we're not exploring at least 75% of the space and we're likely to drop the whole node in favor of things offering value. If the list was organized such that the first 10 items were evenly spaced throughout the total space, and then next 10 were in a slightly denser formation, etc, then every visit would be checking a more representative view of the overall area available.
We can accomplish this in one of two ways:
- Random! A simple fisher-yates shuffle on the set would reduce to an effectively even distribution more often than you'd think!
- Furthest-Point Insertion: This N^2 function simply inserts the next point whose closest distance to any already-inserted points is the largest among potential candidates. First point is the primary target, next point is the furthest possible from the target, 3rd point is the furthest possible from the other 2, so on and so forth, this creates a nice even distribution that might help maximize coverage of space explored by early visits.
You could take this one step further and actually branch this decision node a second time. The idea being that given a decision at point X, it is assumed that 0-10 possible nearby choices are similar to the result of picking X and may be worth refining. It might be something I try!
Part 3. Does It Work
Yes!!! All of my test combat scenarios that I've built over the years have passed with flying colors.
- Bomberman Enemy who puts down a bomb in your direction and then kicks it at you
- Swapper who swaps you into a group of enemies
- Swapper swapping you into deep water to drown you
- Chuckster throwing a boulder not just to hurt you but to seal the escape route behind you
- Archers using escape teleports to get into better range to shoot you
And the stress tests are now just trivial. Actors have a maximum search budget of 1000 but they rarely reach it. There's a futility check early in to see if they are struggling to find anything better than waiting to early out, and there's an improvement gate to ensure that their best possible turn is slowly growing as they iterate.
I talked before about increasing I by 40x and that is really paying dividends here because the number of game state copies and turn simulations has dropped precipitously within this system. I can even let some actors plan more turns ahead and never come close to spiking the frame time.
The biggest advantage to this new system is feeling like I have far greater control over it. I already feel like my behavior weights are more directly driving the action, I can modify things like the exploration constant and turn lookahead and widening factors to make actors “dumber” or “smarter” and chiefly of all, I don't have to worry about any combination of abilities tanking the performance.
The worst case scenario of too large of a problem is just that the actor won't be able to find the most effective uses within their budget, never that the frame time spikes.
In Conclusion
I suppose I can't call this true Monte Carlo because the simulation is fully deterministic, so maybe you could call the new system UCT-guided simulation tree search or some nonsense but there you have it!
Two weeks of reinforcement learning....learning to arrive at a system that I'm really proud of. Who knows, maybe someone someday can ask another would-be gameplay programmer if their NPCs use “Chronicles planning”.
If you managed to get through this whole thing, please don't hesitate to drop a comment on Bluesky or Mastodon! I'd love to hear your feedback!
Wishlist Chronicles IV: Ebonheim! 🌐
🙋

