How to run MCTS on a highly non-deterministic system?

Question

I'm trying to implement a MCTS algorithm for the AI of a small game. The game is a rpg-simulation. The AI should decides what moves to play in battle. It's a turn base battle (FF6-7 style). There is no movement involved.

I won't go into details but we can safely assume that we know with certainty what move will chose the player in any given situation when it is its turn to play.

Games end-up when one party has no unit alive (4v4). It can take any number of turn (may also never end). There is a lot of RNG element in the damage computation & skill processing (attacks can hit/miss, crit or not, there is a lots of procs going on that can "proc" or not, buffs can have % value to happens ect...). Units have around 6 skills each to give an idea of the branching factor.

I've build-up a preliminary version of the MCTS that gives poor results for now. I'm having trouble with a few things :

One of my main issue is how to handle the non-deterministic states of my moves. I've read a few papers about this but I'm still in the dark.

Some suggest determinizing the game information and run a MCTS tree on that, repeat the process N times to cover a broad range of possible game states and use that information to take your final decision. In the end, it does multiply by a huge factor our computing time since we have to compute N times a MCTS tree instead of one. I cannot rely on that since over the course of a fight I've got thousands of RNG element : 2^1000 MCTS tree to compute where i already struggle with one is not an option :)

I had the idea of adding X children for the same move but it does not seems to be leading to a good answer either. It smooth the RNG curve a bit but can shift it in the opposite direction if the value of X is too big/small compared to the percentage of a particular RNG. And since I got multiple RNG par move (hit change, crit chance, percentage to proc something etc...) I cannot find a decent value of X that satisfies every cases. More of a badband-aid than anythign else.

Likewise adding 1 node per RNG tuple {hit or miss ,crit or not,proc1 or not,proc2 or not,etc...} for each move should cover every possible situations but has some heavy drawbacks : with 5 RNG mecanisms only that means 2^5 node to consider for each move, it is way too much to compute. If we manage to create them all, we could assign them a probability ( linked to the probability of each RNG element in the node's tuple) and use that probability during our selection phase. This should work overall but be really hard on the cpu :/

I also cannot "merge" them in one single node since I've got no way of averaging the player/monsters stat's value accuractely based on two different game state and averaging the move's result during the move processing itself is doable but requieres a lot of simplifcation that are a pain to code and will hurt our accuracy really fast anyway.

Do you have any ideas how to approach this problem ?

Some other aspects of the algorithm are eluding me:

I cannot do a full playout untill a end state because A) It would take a lot of my computing time and B) Some battle may never ends (by design). I've got 2 solutions (that i can mix) - Do a random playout for X turns - Use an evaluation function to try and score the situation.

Even if I consider only health point to evaluate I'm failing to find a good evaluation function to return a reliable value for a given situation (between 1-4 units for the player and the same for the monsters ; I know their hp current/max value). What bothers me is that the fights can vary greatly in length / disparity of powers. That means that sometimes a 0.01% change in Hp matters (for a long game vs a boss for example) and sometimes it is just insignificant (when the player farm a low lvl zone compared to him).

The disparity of power and Hp variance between fights means that my Biais parameter in the UCB selection process is hard to fix. i'm currently using something very low, like 0.03. Anything > 0.1 and the exploration factor is so high that my tree is constructed depth by depth :/

For now I'm also using a biaised way to choose move during my simulation phase : it select the move that the player would choose in the situation and random ones for the AI, leading to a simulation biaised in favor of the player. I've tried using a pure random one for both, but it seems to give worse results. Do you think having a biaised simulation phase works against the purpose of the alogorithm? I'm inclined to think it would just give a pessimistic view to the AI and would not impact the end result too much. Maybe I'm wrong thought.

Any help is welcome :)

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

I think this question is way too broad for StackOverflow, but I'll give you some thoughts:

Using stochastic or probability in tree searches is usually called expectimax searches. You can find a good summary and pseudo-code for Expectimax Approximation with Monte-Carlo Tree Search in chapter 4, but I would recommend using a normal minimax tree search with the expectimax extension. There are a few modifications like Star1, Star2 and Star2.5 for a better runtime (similiar to alpha-beta pruning).

It boils down to not only having decision nodes, but also chance nodes. The probability of each possible outcome should be known and the expected value of each node is multiplied with its probability to know its real expected value.
2^5 nodes per move is high, but not impossibly high, especially for low number of moves and a shallow search. Even a 1-3 depth search shoulld give you some results. In my tetris AI, there are ~30 different possible moves to consider and I calculate the result of three following pieces (for each possible) to select my move. This is done in 2 seconds. I'm sure you have much more time for calculation since you're waiting for user input.
If you know what move the player is obvious, shouldn't it also obvious for your AI?
You don't need to consider a single value (hp), you can have several factors that are weighted different to calculate the expected value. If I come back to my tetris AI, there are 7 factors (bumpiness, highest piece, number of holes, ...) that are calculated, weighted and added together. To get the weights, you could use different methods, I used a genetic algorithm to find the combination of weights that resulted in most lines cleared.

Thxs for the answer ! 1) I've tried using expectimax logic but the number of RNG elt is too big for that. I've got at least a dozen of different RNG mecanism and some have a large range of value. I will also be adding more as the game progress. So the final number of RNG elt will make my number of Chance node be out of control. 2) ive got 1.5 second to compute AI — mydi, Oct 20 '15 at 12:07
3) AI knows the player's mvoe because the player set-up his own AI logic (a short number of if/else statements). I can exploit it to reduce computation but that's it. 4) i've tried a huge eval function and it was too complexe to manage. For now I'd like to find something decent with only Hp and improve it later if necessary — mydi, Oct 20 '15 at 12:07
Can you elaborate a bit on what RNG mechanism you use? What does "have a large range of value" mean? For example, do you mean for example the number of HP an attack damages? In that case, you probably could calculate an average instead of branching the tree for every possible number. — Sven, Oct 20 '15 at 12:11
Base weapon damage (not final damage) is bewteen X-Y. I could average it but i'd loose accuracy as my AI would be blind to worst/best case. Other RNG factor include : hit chance (y/n), crit chance (y/n), proc chance (stun,silence,confusion...)[y/n), block chance (y/n), block crit (y,n), chance to play twice this turn (y/n), chance to dodge (y/n), chance to counter-attack on hit (y/n), chance to refect damage (y/n), chance to refelect a spell (y/n)... And a few others. I'll add more as I go. Even if i consider only binary ones (y/n) that is still a ***load of chance node 2^X to create :/ — mydi, Oct 20 '15 at 12:19

How to run MCTS on a highly non-deterministic system?

1 Answers1