Multi-armed bandits with Rcpp

Question

I am translating the epsilon-greedy algorithm for multiarmed bandits from here. This is a rather nice demonstration of the power and elegance of Rcpp. However, the results from this version do not tally with the one that is mentioned in the link above. I am aware that this is probably a very niche question but have no other venue to post this on!

A summary of the code is as follows. Basically, we have a set of arms, each of which pays out a reward with a pre-defined probability and our job is to show that by drawing at random from the arms while drawing the arm with the best reward intermittently eventually allows us to converge on to the best arm. A nice explanation of this algorithm is provided by John Myles White.

Now, to the code:

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]

struct EpsilonGreedy {
  double epsilon;
  std::vector<int> counts;
  std::vector<double> values;
};

int index_max(std::vector<double>& v) {
  return std::distance(v.begin(), std::max_element(v.begin(), v.end()));
}

int index_rand(std::vector<double>& v) {
  return R::runif(0, v.size()-1);
}

int select_arm(EpsilonGreedy& algo) {
  if (R::runif(0, 1) > algo.epsilon) {
    return index_max(algo.values);
  } else {
    return index_rand(algo.values);
  }
}

void update(EpsilonGreedy& algo, int chosen_arm, double reward) {
  algo.counts[chosen_arm] += 1;

  int n = algo.counts[chosen_arm];
  double value = algo.values[chosen_arm];

  algo.values[chosen_arm] = ((n-1)/n) * value + (1/n) * reward;
}

struct BernoulliArm {
  double p;
};

int draw(BernoulliArm arm) {
  if (R::runif(0, 1) > arm.p) {
    return 0;
  } else {
    return 1;
  }
}

// [[Rcpp::export]]
DataFrame test_algorithm(double epsilon, std::vector<double>& means, int n_sims, int horizon) {

  std::vector<BernoulliArm> arms;

  for (auto& mu : means) {
    BernoulliArm b = {mu};
    arms.push_back(b);
  }

  std::vector<int> sim_num, time, chosen_arms;
  std::vector<double> rewards;

  for (int sim = 1; sim <= n_sims; ++sim) {

    std::vector<int> counts(means.size(), 0);
    std::vector<double> values(means.size(), 0.0); 

    EpsilonGreedy algo = {epsilon, counts, values};

    for (int t = 1; t <= horizon; ++t) {
      int chosen_arm = select_arm(algo);
      double reward = draw(arms[chosen_arm]);
      update(algo, chosen_arm, reward);

      sim_num.push_back(sim);
      time.push_back(t);
      chosen_arms.push_back(chosen_arm);
      rewards.push_back(reward);
    }
  }

  DataFrame results = DataFrame::create(Named("sim_num") = sim_num,
                                        Named("time") = time,
                                        Named("chosen_arm") = chosen_arms,
                                        Named("reward") = rewards);

  return results;
}

/***R

means <- c(0.1, 0.1, 0.1, 0.1, 0.9)
results <- test_algorithm(0.1, means, 5000, 250) 

p2 <- ggplot(results) + geom_bar(aes(x = chosen_arm)) + theme_bw()
*/

The plot of the arm chosen during simulations (i.e., plot p2 above) is as follows:

Obviously, the first arm is being chosen disproportionately despite having a low reward! What is going on?

score 1 · Accepted Answer · answered Apr 03 '18 at 12:18

1

I don't know how the bandits are supposed to work, but a little standard debugging (ie: look at the values generated) revealed that you generated lots of zeros.

After fixing some elementary errors (make your C/C++ loops for (i=0; i<N; ++) ie start at zero and compare with less-than) we are left with other less subtle errors such as runif(1,N) cast to int not giving you a equal range over N values (hint: add 0.5 and round and cast, or sample one integer from the set of 1..N integers).

But the main culprit seems to be your first argument epsilon. Simply setting that to 0.9 gets me a chart like the following where you still the issue with the last 'half' unit missing.

answered Apr 03 '18 at 12:18

Dirk Eddelbuettel

360,940
56
644
725

I did not understand why `R::runif(0, N)` is not giving me a uniform distribution between 0 and N. I exported my `index_rand()` function and ran `qplot(replicate(1e6, index_rand(1:10)), geom = "histogram")`. Sure enough lots of values are missing and only few of the numbers between 0 - 9 are getting sampled. `R::runif(0, v.size() - 1)` is a nice solution to generate a random index from a `std::vector`, I hate to let it go. – buzaku Apr 03 '18 at 12:35
Try this: `Rcpp::cppFunction("int foo(int a, int b) { return round(R::runif(a, b)); } ")` followed by `table(replicate(10000, foo(1,10)))`. You get rounding/truncation errors as you would expect. There are other subtle issues here you glossed over when going from R/Julia to C++. Just debug it. – Dirk Eddelbuettel Apr 03 '18 at 12:39
Argh! this is beyond my current skill level! I spent several iterations changing specific aspects but cant solve this.... – buzaku Apr 04 '18 at 06:22
Dirk - I was able to solve this with the help of @hong-ooi in the post [here](https://stackoverflow.com/questions/49727727/bandits-with-rcpp?noredirect=1#comment86473909_49727727). Is it okay if I prepare this as an example for Rcpp gallery? – buzaku Apr 09 '18 at 10:08
1) I don't know why this needed a new question. 2) A _lot_ depends on how good your writeup is. Not every solved SO question needs a Gallery post. In principle, we could. But it depends ... – Dirk Eddelbuettel Apr 09 '18 at 11:45
1) New question because I used arma instead of std here, and also numerous small changes. We could argue on this, but well the new perspective helped. 2) Understood. – buzaku Apr 09 '18 at 12:04
StackOverflow has an edit function. _Evolving_ questions as understanding grows is coming, and a good thing. There may be exceptions, but _ex ante_ you should have edited. What use does the old question now have? – Dirk Eddelbuettel Apr 09 '18 at 12:05
Point noted. Thank you for your suggestions; helped me gather thoughts and think through clearly. – buzaku Apr 09 '18 at 12:16
Do work on the Rcpp Gallery idea. I like the idea of comparing to a clean and simple Julia solution. Using the JuliaCall package we may even be able to benchmark these (though net speed here may well be a function of RNG speed). It has to be more than a story of "oh, look how I learned to program with C++ integers" ;-) – Dirk Eddelbuettel Apr 09 '18 at 12:19
And please consider just closing or deleting this question. – Dirk Eddelbuettel Apr 09 '18 at 12:20
:D A part of me just wants this comment to stay on: *oh, look how I learned to program with C++ integers" ;-) * Classic! So I will not delete this question. – buzaku Apr 09 '18 at 12:28

Multi-armed bandits with Rcpp

1 Answers1

Linked