-1

For the purpose of stochastic simulation, would the following algorithm suffice to produce 1 million pseudorandom decimal numbers of the same quality as a simple rand() command that you'd find in most computer languages? The premise of the algorithm is to use 10 quality decimal pseudorandoms and expand them into 1 million quality decimal pseudorands.

Please note the following is just an algorithm and not real code.

double rands[10] = {rand()}; /// initialize a vector of 10 quality pseudorands [0,1]
double expandedRands[1000000] = {0}; /// initialize a vector of size 1 million

for(int i = 0; i < 10; i++)
{
  for(double j = 0; j < 100000; j++) /// j goes from zero to one hundred thousand
   {
   expandedRands[(100000 * i) + j] = rands[i] * abs((j - 0.5)/ 1000000);
   }
}

EDIT: I realize that a human being could clearly look at numbers generated from this algorithm and know that they follow a pattern, but the real question is would a stochastic simulation work the same way if fed these numbers rather than 1 million rand() numbers.

Jordan
  • 305
  • 3
  • 13
  • `float rands[10] = {rand()};` is not valid C. – Jonathon Reinhart Jun 19 '14 at 01:26
  • You division is not very effective. `(j / 100000)` – this Jun 19 '14 at 01:31
  • `rand()`, at least in C, hardly counts as a 'quality' pRNG. `j/1000000` is always 0 with integer division, so this should result in a big array of zeros. Even if that were floating point math you're still not getting a good, uniform distribution. – bames53 Jun 19 '14 at 01:31
  • If you're using C++ you should look into the standard `` library. – bames53 Jun 19 '14 at 01:32
  • @bames53 He tagged it C, not C++. – Carey Gregory Jun 19 '14 at 01:35
  • What is your actual objective? Why would you simply not call `rand()` 1 million times? – Carey Gregory Jun 19 '14 at 01:37
  • @CareyGregory that tag was edited after I read the question. previously it was 'math'. – bames53 Jun 19 '14 at 01:38
  • @bames53 Ah, so I see, and it wasn't the OP who made the edit. – Carey Gregory Jun 19 '14 at 01:39
  • Hey guys, thanks for the responses. I realize my question was lacking some info so I updated it accordingly, thanks. The context is pretty complicated and involves CUDA kernels in GPU programming. – Jordan Jun 19 '14 at 01:39
  • @bames53 tagging this to math makes sense, I feel like this is some "intro to stats" homework question – WSBT Jun 19 '14 at 01:51
  • @Jordan I'm no mathematician but I don't think it's possible to extend 10 random numbers into 1 million random numbers by any simple arithmetic manipulation. Once the algorithm is known, correctly predicting the next number in your sequence is trivial. And even without knowing the algorithm, such an approach would fail to meet even the weakest criteria for a random number. Repeating patterns would emerge and would be obvious. – Carey Gregory Jun 19 '14 at 01:52
  • Hi Carey. I realize that a human being could clearly look at numbers generated from this algorithm and know that they follow a pattern, but the real question is would a stochastic simulation work the same way if fed these numbers rather than 1 million rand() numbers. – Jordan Jun 19 '14 at 01:58
  • @Jordan No, your algorithm produces a skewed distribution. The `abs((j - 0.5)/ 1000000);` change does not fix it. – bames53 Jun 19 '14 at 02:23

3 Answers3

2

Your algorithm does not generate a uniform distribution.

expandedRands[(100000 * i) + j] = rands[i] * (j / 100000);

First, for each initial random value 𝑖 you generate 100,000 values in the range [0,𝑖). This clearly skews the distribution toward lower values.

Furthermore, every value in the final data is generated from just one of the initial 10 values, and those are all evenly spaced. This leaks quite a bit of information to observers and means they'll be able to guess more values in the final array with a pretty good likelihood of making correct guesses.

Presumably you need to stretch 10 calls to rand() into 1,000,000 quality random numbers because rand() is very slow (and hopefully generates very good random data in return). What I would do under these circumstances is use the results of rand() as nothing more than a seed for a good, deterministic pRNG.

Some code, including C++ facilities for implementing this idea:

// initialize a vector of 10 quality pseudorands [0,RAND_MAX]
int rands[10];
for(int i = 0; i < 10; ++i) { rands[i] = rand(); }

std::seed_seq seeds(begin(rands), end(rands));
// seed_seq is from C++ and performs a standard RNG 'warm-up' sequence
// In other languages you'll simply implement a warm-up sequence yourself.

std::mt19937 eng(seeds);
// mt19937 is an implementation of a standard RNG.
// the seed_seq ensures a good initial state for producing random bits
// You can use whatever standard pRNG algorithm meets your quality/performance/size needs
// For example, if you need something faster and with a smaller state you could use a linear congruential engine such as minstd_rand0

std::uniform_real_distribution<double> dist(0.0, 1.0);
// a C++ object which takes random bits and produces random values with a good distribution.
// there are many different algorithms for doing this

double expandedRands[1_000_000];

for(int i = 0; i < 1_000_000; ++j) {
  expandedRands[i] = dist(eng);
}

expandedRands now contains one million values uniformly distributed in the range [0.0, 1.0). Given the same initial 10 random values you will get the same million output values, and any difference in the input should produce quite different output.


If you're stretching rand()'s results because you need something that is more parallelizable than serialized calls to rand(), then what you can do is use the ten rand() calls to generate a seed sequence, and then use that to seed several independent pRNG engines that could be run on different cores or in independent instances of a GPGPU kernel (if you can implement the pRNG and distribution in CUDA or whatever).

int rands[10];
for (int i = 0; i < 10; ++i) { rands[i] = rand(); }

std::seed_seq seeds(begin(rands), end(rands));

std::mt19937 eng[10];
for (int i = 0; i < 10; ++i) { eng.seed(seeds); }

// now the engines can be used on independent threads.

P.S. I know your code is only pseudo-code, but I've seen a certain mistake in C a fair bit, so just in case you wrote your code this way due to the same misconception about C:

double rands[10] = {rand()};

The initializer in C does not execute that expression 10 times and initialize each element with a different value. What happens in C is that, when there are fewer initializers than elements in the array, the initializers that are there get assigned to their corresponding elements (first initializer to the first element, second initializer to the second element, etc.) and then the rest of the elements are zero initialized. So for example:

int x[10] = {0};

will initialize the whole array to zeros, but:

int x[10] = {1};

will initialize the first element to one and then the rest to zero.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • WOW nice answer. Thank you for a practical solution. Just to theorize on the quality of the random numbers generated by my algorithm in the opening post- If all 1 million of the numbers were shuffled to eliminate the obvious convergence and skewing observed, would this data set then be a good random sample? if not, forgive me for my mathematical ignorance :p – Jordan Jun 19 '14 at 02:41
  • @Jordan Shuffling the order would not change the distribution. The values would still be skewed low. – bames53 Jun 19 '14 at 02:42
  • 1
    @Jordan Also, if you need to know how to implement the seed_seq warm-up algorithm or the pRNG algorithms and you can't find good generic documentation, then the ISO C++ standard has that information: http://stackoverflow.com/questions/6086180/can-i-read-the-c-2011-fdis-anywhere – bames53 Jun 19 '14 at 02:47
  • Just hypothetically.... What if there was a 50% chance that for each of the 10 initial random numbers, the expansion would occur in the opposite direction in order to get some high-skewed points as well. What do you think? – Jordan Jun 19 '14 at 03:03
  • @Jordan My intuition is that that would still result in a skew, but one toward the top and bottom, away from the middle. – bames53 Jun 19 '14 at 04:21
  • Very interesting, thanks for sharing. I would test this idea myself, but how do you think the distribution would look if the expansion took place in both directions? I.e expanding .4 will give 50000 numbers from 0 to .4 and 50000 numbers from .4 to .8. – Jordan Jun 21 '14 at 03:36
  • @Jordan what that does is effectively change the distribution of the initial 10 values, but that doesn't eliminate the effects which cause the skew. In fact, if you take a moment to think about it I think you'll see that you won't get a uniform density across the range by generating a fixed number of points for each random value unless every 'random' value is 1.0 (when generating points downward) and 0.0 (when generating points upward). This is because as the random value approaches the other boundary the generated points in between become more and more densely packed. – bames53 Jun 21 '14 at 04:28
  • You can 'fix' this by generating a number of points proportional to the area they'll fit in: http://rextester.com/VCDDX43259 Of course that means you won't get a fixed number of points generated. Keep in mind that to get that nice flat distribution the results of 100 runs of the generator had to be combined. (You can edit that code to see what a single run looks like) – bames53 Jun 21 '14 at 04:39
  • That's because with only 10 random values there's lots of variance, and the final data of ~1,000,000 points will retain that low variance. With 100 runs I get 1,000 random values, and that's enough to make the distribution of the final 100,000,000 samples look uniform. – bames53 Jun 21 '14 at 04:39
  • Of course there are many other ways to measure how 'random' data is and that data will be clearly non-random according to some measures. (This data will in fact be regularly spaced samples and won't look random at all. If regularly spaced values work then there are easier ways to generate them.) Whether it matters to your simulation depends on what characteristics the simulation is sensitive to. I still would recommend using a different approach, but since I don't know why you chose this approach I can't definitively offer a better one. – bames53 Jun 21 '14 at 04:46
1

This is not going to generate 1,000,000 pseudo random numbers at all.

You are expanding an array of only 10 "real" pseudo random numbers, into 1 million, by using addition, multiplication, and subtraction.

In the end, you still have only 10 random numbers.

Think about this, if the system function rand() produces only a binary value, either 1 or 0. The chance of you getting rands[10] filled with all zeroes are: (0.5)^10, or about 0.098%.

Now with your expandedRands[(100000 * i) + j] = rands[i] * (j / 100000);, you will populate the entire 1 million numbers with zeroes, because rands[i] is 0, so rands[i] * (j / 100000) is 0.

What's the chance of getting all numbers as zeroes, if you truly generated 1,000,000 numbers?

(0.5)^1000000 = 0. You would have a better chance of winning the lottery ticket you didn't even buy, than having this to happen even once.

WSBT
  • 33,033
  • 18
  • 128
  • 133
0

As j get bigger and bigger you'll end with your quality random number i (j/100000=1).

Try to plot it with a graph in Excel and you'll clearly see that you converge to your random number.

ForguesR
  • 3,558
  • 1
  • 17
  • 39
  • Sorry if my question was misleading, the idea is to determine if a stochastic simulation would be effected by "random" numbers generated this way. – Jordan Jun 19 '14 at 02:08