2

I have a problem in understanding the offset parameter of curand_init.

The cuRAND guide says:

The offset parameter is used to skip ahead in the sequence. If offset = 100, the first random number generated will be the 100th in the sequence. This allows multiple runs of the same program to continue generating results from the same sequence without overlap.

This seems to be the skipahead concept illustrated in

T. Bradley, J. du Toit, R. Tong, M. Giles, P. Woodhams, "Parallelization Techniques for Random Number Generators", GPU Computing Gems, Emerald Edition.

Consider the following code:

#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>

#define BLOCKSIZE 256

/**********/
/* iDivUp */
/**********/
int iDivUp(int a, int b){ return ((a % b) != 0) ? (a / b + 1) : (a / b); }

/***********************/
/* CUDA ERROR CHECKING */
/***********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
    if (code != cudaSuccess) 
    {
        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line); 
        if (abort) exit(code);
    }
}

/********************************************************/
/* KERNEL FUNCTION FOR TESTING RANDOM NUMBER GENERATION */
/********************************************************/
__global__ void testrand1(unsigned long seed, float *a, int N){
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    curandState state;
    if (idx < N) {
        curand_init(seed, idx, 0, &state);
        a[idx] = curand_uniform(&state);
    }
}

/********/
/* MAIN */
/********/
int main() {

    const int N = 10;

    float *h_a  = (float*)malloc(N*sizeof(float));
    float *d_a; gpuErrchk(cudaMalloc((void**)&d_a, N*sizeof(float)));

    testrand1<<<iDivUp(N, BLOCKSIZE), BLOCKSIZE>>>(1234, d_a, N);
    gpuErrchk(cudaPeekAtLastError());
    gpuErrchk(cudaDeviceSynchronize());

    gpuErrchk(cudaMemcpy(h_a, d_a, N*sizeof(float), cudaMemcpyDeviceToHost));

    for (int i=0; i<N; i++) printf("%i %f\n", i, h_a[i]);

    getchar();
}

A run to this code generates:

0 0.145468
1 0.820181
2 0.550399
3 0.294830
4 0.914733
5 0.868979
6 0.321921
7 0.782857
8 0.011302
9 0.285450

Now, if I use

curand_init(seed, idx+2, 0, &state);

I receive:

0 0.550399
1 0.294830
2 0.914733
3 0.868979
4 0.321921
5 0.782857
6 0.011302
7 0.285450
8 0.781606
9 0.233840

which means that the generated sequence starts from the third element of the Random Number Generator (RNG) sequence associated to seed = 1234. According to the definition in the cuRAND guide, I would then say that

curand_init(seed, idx+2, 0, &state);

should be equivalent to

curand_init(seed, idx, 2, &state);

Unfortunately, if I use the above line, I receive:

0 0.870710
1 0.511765
2 0.782640
3 0.620706
4 0.554513
5 0.214082
6 0.118647
7 0.993959
8 0.104572
9 0.231619

different from the above one.

On page 11 of the same guide, when discussing curand_init(), it is written:

The state set up will be the state after 2^67 # sequence + offset calls to curand() from the seed state.

which I interpret as offset offsets within one of 2^67 available sequences associated to a specific seed.

Anyone helps me understanding the usage of the offset parameter in curand_init?

Vitality
  • 20,705
  • 4
  • 108
  • 146

1 Answers1

7
curand_init(seed, idx+2, 0, &state);

should not be equivalent to:

curand_init(seed, idx, 2, &state);

because sequences generated with the same seed and different sequence numbers will not have statistically correlated values. Note that there is no additional qualification of this statement based on the offset parameter.

The overall sequence generated has a period greater than 2^190. Within this overall sequence there are subsequences that are identified by the 2nd parameter of the curand_init call. These subsequences are independent from each other (and not statistically correlated). These subsequences are approximately 2^67 numbers apart in the overall sequence. The offset parameter (3rd parameter) selects the starting position within this subsequence. Since the offset parameter cannot be larger than 2^67, it's not possible to use the offset parameter by itself to cause generated numbers to overlap.

However there is a skipahead_sequence function provided which can allow us to do this, if we choose. Consider the following modification to your code and sample run:

$ cat t557.cu
#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>

#define BLOCKSIZE 256

/**********/
/* iDivUp */
/**********/
int iDivUp(int a, int b){ return ((a % b) != 0) ? (a / b + 1) : (a / b); }

/***********************/
/* CUDA ERROR CHECKING */
/***********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
    if (code != cudaSuccess)
    {
        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) exit(code);
    }
}

/********************************************************/
/* KERNEL FUNCTION FOR TESTING RANDOM NUMBER GENERATION */
/********************************************************/
__global__ void testrand1(unsigned long seed, float *a, int N){
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    curandState state;
    if (idx < N) {
        curand_init(seed, idx, 0, &state);
        a[(idx*2)] = curand_uniform(&state);
        if(idx%2)
          skipahead_sequence(1, &state);
        a[(idx*2)+1] = curand_uniform(&state);

    }
}

/********/
/* MAIN */
/********/
int main() {

    const int N = 10;

    float *h_a  = (float*)malloc(2*N*sizeof(float));
    float *d_a; gpuErrchk(cudaMalloc((void**)&d_a, 2*N*sizeof(float)));

    testrand1<<<iDivUp(N, BLOCKSIZE), BLOCKSIZE>>>(1234, d_a, N);
    gpuErrchk(cudaPeekAtLastError());
    gpuErrchk(cudaDeviceSynchronize());

    gpuErrchk(cudaMemcpy(h_a, d_a, 2*N*sizeof(float), cudaMemcpyDeviceToHost));

    for (int i=0; i<2*N; i++) printf("%i %f\n", i, h_a[i]);

}
$ nvcc -arch=sm_20 -o t557 t557.cu
$ ./t557
0 0.145468
1 0.434899
2 0.820181
3 0.811845
4 0.550399
5 0.811845
6 0.294830
7 0.557235
8 0.914733
9 0.557235
10 0.868979
11 0.206681
12 0.321921
13 0.206681
14 0.782857
15 0.539587
16 0.011302
17 0.539587
18 0.285450
19 0.739071
$

The kernel now causes each thread to generate 2 numbers in their sequence, but alternate threads will skipahead_sequence in between the generation of the 2 numbers. Since I have chosen 1 as the first parameter to skipahead_sequence, the effect of the function call is as if I had asked for 1*2^67 numbers, i.e. as if I had done a call to curand_uniform 2^67 times. This means that pairs of threads will have their 2nd number generated be identical, and this is exactly what we see at indices 3 and 5, 7 and 9, 11 and 13, etc. in the output.

Here's some ASCII art:

Main Sequence (length 2^190) (determined by seed parameter):

|0     ...   2^67-1 2^67 ...     2^68-1 2^68 ... ... ... 2^190|0 ... 
                                                              (main seq repeats)

Sequences (each have length 2^67) (determined by sequence parameter):

|seq0  ...   2^67-1|seq1   ...   2^68-1|seq2   ... 
                           ^
                   |offset |                     (determined by offset parameter)
                           |
                           RNG begins here for given seed, sequence(=seq1), offset
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • +1 and accepted. Excellent answer, as always. Just to complete the picture to myself, which kind of RNG is invoked with a call to `curand_uniform`? Are the sequences `seq0, seq1, ...`, pseudo-random or quasi-random? – Vitality Sep 14 '14 at 20:26
  • 2
    [the default generator type is pseudorandom numbers from XORWOW](http://docs.nvidia.com/cuda/curand/device-api-overview.html#quasirandom-sequences), and my description above (as well as the code you provided) has that generator in view. The sequence mechanics can vary depending on the underlying generator, for example [mersenne twister](http://docs.nvidia.com/cuda/curand/device-api-overview.html#bit-generation-2) is quite a bit different. – Robert Crovella Sep 14 '14 at 20:30
  • I read your answer but i become a bit confused when i consider your answer and the result that @JackOLantern had seen. you said that "Within this overall sequence there are subsequences that are identified by the 2nd parameter of the curand_init call. These subsequences are independent from each other (and not statistically correlated)." so i expect that "curand_init(seed, idx, 2, &state);" be started from the third element of the Random Number Generator (RNG) sequence associated to seed = 1234 NOT "curand_init(seed, idx+2, 0, &state);" where am i wrong? – pouyan Jan 08 '17 at 17:03