In thompson sampling, we maintain beta parameters for each arm and sample from this beta distribution to pick the best arm.
Why can't we just maintain mean of beta distribution for each arm (alpha_k/(alpha_k+beta_k)) and pick arm with this probability.
For example, suppose we have 3 arms.
Arm 1 (alpha = 1, beta = 1), i.e. mean = 0.5
Arm 2 (alpha = 2, beta = 1), i.e. mean = 0.67
Arm 3 (alpha = 1, beta = 1), i.e. mean = 0.5
Why can't we pick them proportionately to their means
Arm 1 with probability 0.5/(0.5+0.67+0.5) = 0.3
Arm 2 with probability 0.67/(0.5+0.67+0.5) = 0.4
Arm 3 with probability 0.5/(0.5+0.67+0.5) = 0.3
Would it not converge?
I understand beta distribution based analysis but I am not able to see what is the probalem with choosing each arm based on its likelihood.