92

The code below is meant to generate a list of five pseudo-random numbers in the interval [1,100]. I seed the default_random_engine with time(0), which returns the system time in unix time. When I compile and run this program on Windows 7 using Microsoft Visual Studio 2013, it works as expected (see below). When I do so in Arch Linux with the g++ compiler, however, it behaves strangely.

In Linux, 5 numbers will be generated each time. The last 4 numbers will be different on each execution (as will often be the case), but the first number will stay the same.

Example output from 5 executions on Windows and Linux:

      | Windows:       | Linux:        
---------------------------------------
Run 1 | 54,01,91,73,68 | 25,38,40,42,21
Run 2 | 46,24,16,93,82 | 25,78,66,80,81
Run 3 | 86,36,33,63,05 | 25,17,93,17,40
Run 4 | 75,79,66,23,84 | 25,70,95,01,54
Run 5 | 64,36,32,44,85 | 25,09,22,38,13

Adding to the mystery, that first number periodically increments by one on Linux. After obtaining the above outputs, I waited about 30 minutes and tried again to find that the 1st number had changed and now was always being generated as a 26. It has continued to increment by 1 periodically and is now at 32. It seems to correspond with the changing value of time(0).

Why does the first number rarely change across runs, and then when it does, increment by 1?

The code. It neatly prints out the 5 numbers and the system time:

#include <iostream>
#include <random>
#include <time.h>

using namespace std;

int main()
{
    const int upper_bound = 100;
    const int lower_bound = 1;

    time_t system_time = time(0);    

    default_random_engine e(system_time);
    uniform_int_distribution<int> u(lower_bound, upper_bound);

    cout << '#' << '\t' << "system time" << endl
         << "-------------------" << endl;

    for (int counter = 1; counter <= 5; counter++)
    {
        int secret = u(e);
        cout << secret << '\t' << system_time << endl;
    }   

    system("pause");
    return 0;
}
Tas
  • 7,023
  • 3
  • 36
  • 51
Amin Mesbah
  • 872
  • 6
  • 13

3 Answers3

142

Here's what's going on:

  • default_random_engine in libstdc++ (GCC's standard library) is minstd_rand0, which is a simple linear congruential engine:

    typedef linear_congruential_engine<uint_fast32_t, 16807, 0, 2147483647> minstd_rand0;
    
  • The way this engine generates random numbers is xi+1 = (16807xi + 0) mod 2147483647.

  • Therefore, if the seeds are different by 1, then most of the time the first generated number will differ by 16807.

  • The range of this generator is [1, 2147483646]. The way libstdc++'s uniform_int_distribution maps it to an integer in the range [1, 100] is essentially this: generate a number n. If the number is not greater than 2147483600, then return (n - 1) / 21474836 + 1; otherwise, try again with a new number.

    It should be easy to see that in the vast majority of cases, two ns that differ by only 16807 will yield the same number in [1, 100] under this procedure. In fact, one would expect the generated number to increase by one about every 21474836 / 16807 = 1278 seconds or 21.3 minutes, which agrees pretty well with your observations.

MSVC's default_random_engine is mt19937, which doesn't have this problem.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • 37
    I wonder what possessed the developers of GCC's standard library to choose such a horrible default. – CodesInChaos Sep 23 '15 at 13:58
  • 13
    @CodesInChaos I don't know if it's related of not but the MacOS/iOS toolchain also use the same horrible random engine, making [`rand()` % 7 always return 0](http://stackoverflow.com/q/7866754/995714) – phuclv Sep 23 '15 at 14:48
  • 7
    @LưuVĩnhPhúc Not fixing `rand()` is somewhat understandable (it's hopeless legacy crap). Using a shit-tier PRNG for something new is inexcusable. I'd even consider this a standard violation, since the standard requires "provide at least acceptable engine behavior for relatively casual, inexpert, and/or lightweight use." which this implementation does not provide since it fails catastrophically even for trivial use cases like your `rand % 7` example. – CodesInChaos Sep 24 '15 at 08:54
  • 2
    @CodesInChaos Why is not fixing `rand()` somewhat understandable exactly? Is it only because nobody might have thought to do it? – user253751 Sep 24 '15 at 11:11
  • 2
    @immibis The API is so broken that you're better off with an independent replacement that fixes all the issues. 1) Replacing the algorithm would be a breaking change, so you'd probably need a compatibility switch for older programs. 2) The seed of `srand` is too small to easily generate unique seeds. 3) It returns an integer with an implementation defined upper bound which the caller has to somehow reduce to a number in the desired range, which when done properly is more work than writing a replacement with a sane API for `rand()` 4) It uses global mutable state – CodesInChaos Sep 24 '15 at 11:22
  • 1
    Great analysis. I wonder why `uniform_int_distribution` uses `/ 21474836` instead of `% 100`? – Mark Ransom Sep 24 '15 at 18:19
  • 1
    Since X1 is the same (e.g. 25 in the example above) for minutes at a time, why is X2, X3, etc. different in each series? Isn't X2 solely dependent on X1 with no new entropy being fed in? – Dan Sep 26 '15 at 14:28
  • 1
    @Dan The real X1 is not the same (most of the time, they differ by some small multiple of 16807). The number produced is the same because the method used to produce it happens to map the different X1's to the same value. – T.C. Sep 26 '15 at 18:01
  • @T.C. Ah yes, makes sense -- I see that now. Thanks for replying to my question. – Dan Sep 26 '15 at 23:20
  • 1
    Also not sure why they use an algorithm with super slow modulo operator. XorShift is much faster *and* higher quality. LCG is *slow*. Many people don't realize that. – usr Oct 02 '15 at 12:23
31

The std::default_random_engine is implementation defined. Use std::mt19937 or std::mt19937_64 instead.

In addition std::time and the ctime functions are not very accurate, use the types defined in the <chrono> header instead:

#include <iostream>
#include <random>
#include <chrono>

int main()
{
    const int upper_bound = 100;
    const int lower_bound = 1;

    auto t = std::chrono::high_resolution_clock::now().time_since_epoch().count();

    std::mt19937 e;
    e.seed(static_cast<unsigned int>(t)); //Seed engine with timed value.
    std::uniform_int_distribution<int> u(lower_bound, upper_bound);

    std::cout << '#' << '\t' << "system time" << std::endl
    << "-------------------" << std::endl;

    for (int counter = 1; counter <= 5; counter++)
    {
        int secret = u(e);

        std::cout << secret << '\t' << t << std::endl;
    }   

    system("pause");
    return 0;
}
Casey
  • 10,297
  • 11
  • 59
  • 88
  • 3
    Is it desirable to use a more accurate time when seeding a pseudo-random variable generator? Perhaps this is naive, but it feels like inaccuracy might almost be desirable if it introduces entropy. (Unless you mean it's less precise and thus results in materially fewer potential seeds.) – Nat Sep 23 '15 at 11:48
  • 15
    I would just suggest using `std::random_device` instead of current_time for seeding your random generator. Please check any cppreference example about Random. – Aleksander Fular Sep 23 '15 at 12:45
  • 5
    If you don't want anyone to guess your seed (and therefore reproduce your sequence) less precision is not the same as more randomness. Let's go to the extreme: Round your seed to the next day (or year?) -> guessing is easy. Use femtosecond precision -> Lots of guessing to do ... – linac Sep 23 '15 at 12:45
  • 1
    @AleksanderFular I would too, but I was keeping in line with the spirit of the question since the OP was using time as an input. – Casey Sep 23 '15 at 15:46
  • 2
    @ChemicalEngineer The granularity of `ctime` is 1 second. The granularity of `std::chrono` implementations is user-defined, defaulting to, for `std::high_resolution_clock` (in Visual Studio it's a typedef for `std::steady_clock`), nanoseconds but can choose a much smaller measurement, hence, much more precise. – Casey Sep 23 '15 at 21:10
  • 2
    @linac If you wanted cryptographic properties you would use appropriate prng (not one used in this answer). And of course time-based seed is also out of the question, no matter the promised precision. – Cthulhu Sep 23 '15 at 23:52
  • @linac: I think that there's accuracy-vs.-precision confusion here. I agree that greater precision is helpful, e.g. femtoseconds are better than years. Say `GetTime()` and `GetAccurateTime()` both report to femtosecond, except `GetAccurateTime()` always reports the true femtosecond while `GetTime()` might be a bit off. Would we ever use `GetAccurateTime()` over `GetTime()`? However, Casey addressed my question by noting that, in this case, `GetAccurateTime()` is both more precise and more accurate, making the question moot in this context. – Nat Sep 30 '15 at 05:29
-2

In Linux, the random function is not a random function in the probabilistic sense of the way, but a pseudo random number generator. It is salted with a seed, and based on that seed, the numbers that are produced are pseudo random and uniformly distributed. The Linux way has the advantage that in the design of certain experiments using information from populations, that the repeat of the experiment with known tweaking of input information can be measured. When the final program is ready for real-life testing, the salt (seed), can be created by asking for the user to move the mouse, mix the mouse movement with some keystrokes and add in a dash of microsecond counts since the beginning of the last power on.

Windows random number seed is obtained from the collection of mouse, keyboard, network and time of day numbers. It is not repeatable. But this salt value may be reset to a known seed, if as mentioned above, one is involved in the design of an experiment.

Oh yes, Linux has two random number generators. One, the default is modulo 32bits, and the other is modulo 64bits. Your choice depends on the accuracy needs and amount of compute time you wish to consume for your testing or actual use.

  • 5
    I'm not sure why are you talking about seed generation algorithm. OP clearly uses system time as a seed. Also, can you add some references to `collection of mouse, keyboard, network and time of day numbers` – default locale Sep 30 '15 at 04:00