9

I am trying to translate some python code to C++. What the code does is to run a monte carlo simulation. I thought the results from Python and C++ could be very close, but seems something funny happened.

Here is what I do in Python:

self.__length = 100
self.__monte_carlo_array=np.random.uniform(0.0, 1.0, self.__length)

Here is what I do in C++:

int length = 100;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);

for(int i = 0; i < length; i++)
{
    double d = distribution(mt);
    monte_carlo_array[i] = d;
}

I ran above random number generation 100x5 times both in Python and C++, and then do monte carlo simulation with these random numbers.

In monte carlo simulation, I set the threshold as 0.5, thus I can easily verify if the results are uniform distributed.

Here is a conceptual draft what monte carlo simulation does:

for(i = 0; i < length; i++)
{
    if(monte_carlo_array[i] > threshold)    // threshold = 0.5
        monte_carlo_output[i] = 1;
    else
        monte_carlo_output[i] = 0;
}

Since the length of the monte carlo array is 120, I expect to see 60 1s both in Python and C++. I calculate the average number of 1s and found that, although the average number in C++ and Python is around 60, but the trend are highly correlated. Moreover, the average number in Python is always higher than in C++.

distribution chart May I know if this is because I've done something wrong, or it is simply because the difference between random generation mechanisms in C++ and Python?

[edit] Please note that the RNG in Python is also the Mersenne Twister 19937.

MSalters
  • 173,980
  • 10
  • 155
  • 350
ChangeMyName
  • 7,018
  • 14
  • 56
  • 93
  • 1
    Different random number generators give different sets of random numbers. I would expect that if you ran it a few more times (like hundreds of times), you'd get a less obvious difference. – Mats Petersson Aug 12 '13 at 10:21
  • 2
    Is this really what you see with the code you show? There must be other inputs, otherwise there wouldn't be any correlation between the codes at all! I suspect the bug is elsewhere... – Andrew Jaffe Aug 12 '13 at 10:38
  • Those results seams manipulated... – LS_ᴅᴇᴠ Aug 12 '13 at 10:53
  • "Since the length of the monte carlo array is 120" - in code sample length is 100, are you sure, you don't forget change size in all places? – user1837009 Aug 12 '13 at 10:57
  • 2
    There is not enough data to conclude anything. You have just 5 runs, all results lie well within the one standard deviation interval (`[52,68]` assuming Poissonian uncertainties). Yes, there are obviously differences between the numbers. But you initialize both your RNGs with a random seed, what did you expect? Also, what exactly does the plot show? If you just count the ones, how do you end up withnumbers like 59.75 (Python, run 4)? – Carsten Aug 12 '13 at 10:58
  • " the trend are highly correlated.". _Which_ trend? For functions `f(x)` and `g(x)` to be correlated, they must share a support (domain) for `x`. What's that support here? What's on your x axis? – MSalters Aug 12 '13 at 11:06
  • 1
    @Carsten: He's actually generating 120 Bernoulli's with p=0.5, so the sum has a binomial(120,.5) distribution rather than Poisson. That makes the standard deviation sqrt(30) or about 5.48. Your larger points are certainly correct - all of the results fall well within one standard deviation of the mean, and 5 is nowhere near a large enough number of replications to say anything meaningful. – pjs Aug 12 '13 at 13:41
  • @pjs Sorry, you're right, I didn't really think that through. Thanks for noticing. :) – Carsten Aug 12 '13 at 13:50

2 Answers2

5

I wrote this based on the code posted:

import numpy as np

length = 1000
monte_carlo_array=np.random.uniform(0.0, 1.0, length)
# print monte_carlo_array
threshold = 0.5
above = 0

for i in range (0,length):
    if monte_carlo_array[i] > threshold:
        above+=1
print above

and this in C++:

#include <random> 
#include <iostream>

int main()
{
    const int length = 1000;
    std::random_device rd;
    std::mt19937_64 mt(rd());
    std::uniform_real_distribution<double> distribution(0, 1);
    double threshold = 0.5;
    double monte_carlo_array[length];

    for(int i = 0; i < length; i++)
    {
        double d = distribution(mt);
        monte_carlo_array[i] = d;
    }
    int above = 0;

    for(int i = 0; i < length; i++)
    {
        if (monte_carlo_array[i] > threshold)
        {
            above++;
        }
    }
    std::cout << above << std::endl;
}

Five runs each gives:

Python:
480
507
485
515
506
average:
498.6

C++:
499
484
531
509
509
average
506.4

So if anything I'm finding that C++ is higher than python. But I think it's more a case of "random numbers are not uniformly distributed with a small number of samples."

I changed length to 100000 instead, and still the results vary up and down around 50k:

Python:

50235
49752
50215
49717
49974

Average: 
49978.6

C++:

50085
50018
49993
49779
49966

Average:
49968.2

In summary, I don't think it's any huge difference between the random number implementations in C++ and Python when it comes to "how uniform it is around 0.5". But I've not studied statistics very much (and it was many years ago).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Hi, Mats. Thank you very much for your answer. I totally agree with you that with a large sample, there could not be any huge difference. However, in practice, I don't even generate a 1000 long array but a 120. Thus it is reasonable to see some difference. And in practice, the monte carlo output is multiplied by a random number with in a range, and the range is the same both for Python and C++. int amount = 0; for(i=0;i<120;i++) amount+=monte_carlo_output*random_number;However, what comfuse me is that Python always beats C++ in terms of values. – ChangeMyName Aug 12 '13 at 13:25
  • That is probably because you only ran 5 times. I rewrote the above code to do runs of 120, and although the variation is greater, the result is still that I get around 60 "above 0.5" - but it does vary. I just did 10000 runs in python, and it ranges from 40 to 79 of the numbers being above 0.5. In C++, on a single example, I got 42 to 80. In other words, if you are doing the same thing in C++ and in Python, you should get same result (and you have the same random number generators that I have - that is of course entirely possible if you are running, say, on a Windows machine that you don't) – Mats Petersson Aug 12 '13 at 13:52
0

When you are not sure in random numbers just generate huge amount of random numbers by using service like Random ORG. After that supply this numbers as array in both implementations (C++ and Python). By this way you will be sure that both programs are using the same set of random numbers and you will be able to confirm that the rest of the code is OK.

Todor Balabanov
  • 376
  • 3
  • 6
  • 17