I've created a small AI program which can play Othello. The algorithmn I use (MCTS UTC) has a parameter where I can tune the exploration vs exploitation ratio. This is a single float value ranging from 0 to 10 (infinity is possible but high values don't make a lot of sense)
I can easily let the algorithm play versus itself with different values of this parameter. This would give me an idea which of the two values is better.
What is a good algorithm to optimize this parameter?
(I prefer an algorithm that has some research or publications to go indepth as to why or when it work best.)