Bernoulli Bandit- Thompson sampling Alternate idea

Asked Feb 07 '23 at 02:35

Active Feb 07 '23 at 02:35

Viewed 16 times

In thompson sampling, we maintain beta parameters for each arm and sample from this beta distribution to pick the best arm.

Why can't we just maintain mean of beta distribution for each arm (alpha_k/(alpha_k+beta_k)) and pick arm with this probability.

For example, suppose we have 3 arms.

Arm 1 (alpha = 1, beta = 1), i.e. mean = 0.5
Arm 2 (alpha = 2, beta = 1), i.e. mean = 0.67
Arm 3 (alpha = 1, beta = 1), i.e. mean = 0.5

Why can't we pick them proportionately to their means

Arm 1 with probability 0.5/(0.5+0.67+0.5) = 0.3
Arm 2 with probability 0.67/(0.5+0.67+0.5) = 0.4
Arm 3 with probability 0.5/(0.5+0.67+0.5) = 0.3

Would it not converge?

I understand beta distribution based analysis but I am not able to see what is the probalem with choosing each arm based on its likelihood.

asked Feb 07 '23 at 02:35

Coder1111

0 Answers0