0

I am working with C++11's random library, and I have a small program that generates a coordinate pair x, y on a circle with unit radius. Here is the simple multithreaded program

#include <iostream>
#include <fstream>
#include <random>

using namespace std;



int main()
{
    const double PI = 3.1415;



    double angle, radius, X, Y;
    int i;
    vector<double> finalPositionX, finalPositionY;

    #pragma omp parallel
    {
        vector <double> positionX, positionY;

        mt19937 engine(0);
        uniform_real_distribution<> uniform(0, 1);
        normal_distribution<double> normal(0, 1);



        #pragma omp for private(angle, radius, X, Y)
        for(i=0; i<1000000; ++i)
        {
            angle  = uniform(engine)*2.0*PI;
            radius = sqrt(uniform(engine));
            X      = radius*cos(angle);
            Y      = radius*sin(angle);

            positionX.push_back(X);
            positionY.push_back(Y);
        }
        #pragma omp barrier

        #pragma omp critical
        finalPositionX.insert(finalPositionX.end(), positionX.begin(), positionX.end());
        finalPositionY.insert(finalPositionY.end(), positionY.begin(), positionY.end());
    }


    ofstream output_data("positions.txt", ios::out);
    output_data.precision(9);
    for(unsigned long long temp_var=0; temp_var<(unsigned long long)finalPositionX.size(); temp_var++)
    {
        output_data << finalPositionX[temp_var]
                    << "\t\t\t\t"
                    << finalPositionY[temp_var]
                    << "\n";
    }
    output_data.close();
    return 0;
}

Question: Many of the x-coordinates appear twice (same with y-coordinates). I don't understand this, since the period of the mt19937 is much longer than 1.000.000. Does anyone have an idea of what is wrong here?

Note: I get the same behavior when I don't multithread the application, so the problem is not related to wrong multithreading.

EDIT As pointed out in one of the answers, I shouldn't use the same seed for both threads - but that is an error I made when formulating this question, in my real program I seem the threads differently.

BillyJean
  • 1,537
  • 1
  • 22
  • 39
  • 1
    What do you mean "many"? Have you graphed the distribution to see if there are obvious biases? The period refers to the sequence repeating, not that you won't get repeats in the list. – Collin Apr 15 '13 at 13:32
  • 2
    If you print 10^6 random values to 9 places, you'd expect about 500 dupes (since the probability of each one being a dupe starts at 0 for the first one, and increases to 1/1000 for the last one). Is that what you're seeing, or are you seeing excessive dupes? How have you quantified the degree to which what you see differs from what you should expect? The brain is *extremely bad* at guessing whether or not data is uniform-random just by staring at it. – Steve Jessop Apr 15 '13 at 13:38
  • I seem to recall 0 is a weak seed for mt19937. – brian beuning Apr 15 '13 at 13:45
  • 2
    If the same problem occurs when written without multithreading, then the sample code should not be multithreaded. Eliminate things that don't affect the problem. Don't make people wade through extraneous stuff to eliminate things that you already know aren't relevant. – Pete Becker Apr 15 '13 at 14:59
  • @SteveJessop It was the precision that was the pitfall. When I ouput with 16 digit precision, there are no dupes – BillyJean Apr 15 '13 at 15:18

3 Answers3

1

As described in this article (and a later article by a Stack Overflow contributor), true randomness doesn't distribute perfectly.

Good randomness :

enter image description here

Bad randomness :

enter image description here

I really recommend reading the article, but to summarize it: a RNG has to be unpredictable, which implies that calling it 100 times must not perfectly fill a 10x10 grid.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Tom van der Woerdt
  • 29,532
  • 7
  • 72
  • 105
  • 1
    The lower picture is good randomness for uniform distribution. In a matter of fact, anything can happen. There is a slight chance that all points gets distributed in one point. – BЈовић Mar 14 '14 at 07:52
1

Using the core part of your code, I wrote this imperfect test but from what I can see the distribution is pretty uniform:

#include <iostream>
#include <fstream>
#include <random>
#include <map>
#include <iomanip>

using namespace std;

int main()
{
    int i;
    vector<double> finalPositionX, finalPositionY; 
    std::map<int, int> hist;


    vector <double> positionX, positionY;

    mt19937 engine(0);
    uniform_real_distribution<> uniform(0, 1);
    //normal_distribution<double> normal(0, 1);
    for(i=0; i<1000000; ++i)
    {
        double rnum = uniform(engine);            

       ++hist[std::round(1000*rnum)];

    }

    for (auto p : hist) {
        std::cout << std::fixed << std::setprecision(1) << std::setw(2)
                  << p.first << ' ' << std::string(p.second/200, '*') << '\n';
    }

    return 0;
}

and as others already said it is not unexpected to see some values repeated. For the normal distribution, I used the following modification to rnum and hist to test that and it looks good too:

double rnum = normal(engine);                  
++hist[std::round(10*rnum)];
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
0

First of all - just because you get the same number twice doesn't mean it isn't random. If you throw a dice six times, would you expect six different results? See birthday paradox. That being said - you are right that you shouldn't see too much repetition in this particular case.

I'm not familiar with "#pragma omp parallel", but my guess is you are spawning multiple threads that all seed the mt19937 with the same seed (0). You should use different seeds for all threads - e.g. the thread id.

emilk
  • 324
  • 2
  • 5