1

In converting the example below to a gfor loop. I encountered an error of the type "Invalid dimension for argument 0", the full error message below. However, the error occurs, then the function runs, then the same error. This pattern repeats. I am confused and am wondering if this error is in someway system dependent.

Full error message:

 Error in random_shuffle(theta, 5, 1) : 
 ArrayFire Exception (Invalid input size:203):
In function af_err af_assign_seq(af_array *, const af_array, const unsigned int, const af_seq *, const af_array)
In file src/api/c/assign.cpp:168
Invalid dimension for argument 0
Expected: (outDims.ndims() >= inDims.ndims())

A second problem, the seed fails to change with the input parameter, when using the gfor loop.

#include "RcppArrayFire.h"

using namespace Rcpp;
using namespace RcppArrayFire;

// [[Rcpp::export]]
af::array random_shuffle(const RcppArrayFire::typed_array<f64> theta, int counts, int seed){

  const int theta_size = theta.dims()[0];
  af::array out(counts, theta_size, f64);
  af::array seed_seq = af::seq(seed, seed+counts);

// for(int f = 0; f < counts; f++){
gfor ( af::seq f, counts-1 ){

  af::randomEngine engine;
  engine.setSeed(af::sum<double>(seed_seq(f)));

  af::array index_shuffle(1, u16);

  af::array temp_rand(1, f64);
  af::array temp_end(1, f64);

  af::array shuffled = theta;

// implementation of the Knuth-Fisher-Yates shuffle algo
  for(int i = theta_size-1; i > 1; i --){

    index_shuffle = af::round(af::randu(1, u16, engine)/(65536/(i+1)));
    temp_rand = shuffled(index_shuffle);
    temp_end = shuffled(i);
    shuffled(index_shuffle) = temp_end;
    shuffled(i) = temp_rand;
    }

  out(f, af::span) = shuffled;
  }
  return out;
}


/*** R
theta <- 10:20
random_shuffle(theta, 5, 1)
random_shuffle(theta, 5, 2)
*/

Updated with Ralf Stunber's solution, but 'shuffled' samples in Column space.

    // [[Rcpp::export]]
af::array random_shuffle2(const RcppArrayFire::typed_array<f64> theta, int counts, int seed) {
  int len = theta.dims(0);
  af::setSeed(seed);
  af::array tmp = af::randu(len, counts, 1);
  af::array val, idx;
  af::sort(val, idx, tmp, 1);
  af::array shuffled = theta(idx);
  return af::moddims(shuffled, len, counts);
}
/*** R
random_shuffle2(theta, 5, 1)
*/

Here is a picture of output, sampling with replacement:enter image description here

In the second part, of 50 repetitions, the samples moves towards anergodic outcome.

skatz
  • 115
  • 7

1 Answers1

2

Why do you want to use multiple RNG engines in parallel? There is really no need for this. In general, it should be sufficient to use only the global RNG engine. It should also be sufficient to set the seed of this engine only once. You can do this from R with RcppArrayFire::arrayfire_set_seed. Besides, random number generation within a gfor loop does not work as one might expect, c.f. http://arrayfire.org/docs/page_gfor.htm.

Anyway, I am not an expert in writing efficient GPU algorithms, which is why I like using the methods implemented in libraries like ArrayFire. Unfortunately ArrayFire does not have a shuffle algorithm, but the corresponding issue has a nice implementation, which can be generalized to your case with multiple shuffles:

// [[Rcpp::depends(RcppArrayFire)]]
#include "RcppArrayFire.h"

// [[Rcpp::export]]
af::array random_shuffle(const RcppArrayFire::typed_array<f64> theta, int counts, int seed) {
    int len = theta.dims(0);
    af::setSeed(seed);
    af::array tmp = af::randu(counts, len, 1);
    af::array val, idx;
    af::sort(val, idx, tmp, 1);
    af::array shuffled = theta(idx);
    return af::moddims(shuffled, counts, len);
}

BTW, depending on the later usage it might make more sense to arrange the different samples in columns instead of rows, since both R and AF use column major layout.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • I played around with, [link](https://github.com/arrayfire/arrayfire/issues/1072#issuecomment-222724823), but never got it to work using a loop. As I am bootstrapping I will definitely fill the residuals in columns, but I am storing my coefficients, the kept result, across rows, one per shuffle. As to converting parallel armadillo to a GPU focused algo, I am undecided on form. Currently each iteration randomizes residuals, solves by a nested optimization algo - simplex for the non-linear variables and an analytical solution for the linear variables. – skatz Dec 24 '19 at 01:23
  • I am thinking to use your solution save all the residuals, something like 5 or 10k columns, and then run the optimizations. – skatz Dec 24 '19 at 01:25
  • @skatz That sounds reasonable if you have enough memory. What problems did you have with the shuffling code inside a loop? – Ralf Stubner Dec 25 '19 at 13:36
  • Did some calculations and I should have enough memory - using daily data for maybe 10 year it works out to 10*252*(10k reps)*(5 vectors) at double precision - about 1.21 GB; which is well below the 4GB on the iMac I am using. This leave plenty of room for the uncounted intermediate steps and output. – skatz Dec 25 '19 at 20:12
  • the problems with shuffling inside a 'gfor' loop would be the post above. – skatz Dec 25 '19 at 20:13
  • @skatz Ok, then my explanation above applies. `gfor` and RNG are incompatible. – Ralf Stubner Dec 25 '19 at 22:01
  • After some more work, I think this project will be put on hold, unless I could use 'Parallel Workers', similar to RcppParallel, or maybe something I am unaware of. I would want to keep shuffle without replacement, so the Knuth-Fisher-Yates method,which is simply too slow to justify running on a singular GPU core. Your method above samples with replacement - and works well - but I typically don't have data sets large enough to give ergodic results. – skatz Dec 28 '19 at 22:48
  • @skatz I am confused. The posted solution should give you sampling w/o replacement. Can you provide an example where this is not the case? – Ralf Stubner Dec 29 '19 at 06:18
  • Ralf, thanks for your help, I edited the question with your solution and an example of sampling (with replacement). And a picture of my output. – skatz Dec 29 '19 at 17:46
  • @skat Now I understand. When you want to go from row to column ordering, it is not enough to change only the dimensions of the arrays. You also have to sort the random array `tmp` column wise instead of row wise. For this, you have to change the last argument of sort from `0` to `1`. – Ralf Stubner Dec 29 '19 at 18:33