0

I am wrapping boost random number generators with an adapter class to implement a Monte Carlo routine. When writing unit tests on the member functions of the class, I assumed that the behavior of .discard(unsigned int N) is to draw N random numbers without storing them, thus advancing the state of the rng. The boost code is :

void discard(boost::uintmax_t z)
{
    if(z > BOOST_RANDOM_MERSENNE_TWISTER_DISCARD_THRESHOLD) {
        discard_many(z);
    } else {
        for(boost::uintmax_t j = 0; j < z; ++j) {
            (*this)();
        }
    }
}

which supports my assumption. However, I find that the sequence that results from .discard(1) is not one number different than the same sequence without discard. The code:

#include <iostream>
#include <iomanip>
#include <random>
#include <boost/random.hpp>

int main()
{
    boost::mt19937 uGenOne(1);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > distOne(uGenOne, boost::normal_distribution<>());

    boost::mt19937 uGenTwo(1);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > distTwo(uGenTwo, boost::normal_distribution<>());
    distTwo.engine().discard(1);

    unsigned int M = 10;
    std::vector<double> variatesOne(M);
    std::vector<double> variatesTwo(M);

    for (unsigned int m = 0; m < M; ++m) {
        variatesOne[m] = distOne();
        variatesTwo[m] = distTwo();
    }

    for (unsigned int m = 0; m < M; ++m)
        std::cout << std::left << std::setw(15) << variatesOne[m] << variatesTwo[m] << std::endl;

    return 0;
}

outputs

2.28493        0.538758  
-0.668627      -0.0017866
0.00680682     0.619191  
0.26211        0.26211   
-0.806832      -0.806832 
0.751338       0.751338  
1.50612        1.50612   
-0.0631903     -0.0631903
0.785654       0.785654  
-0.923125      -0.923125

Is my interpretation of how .discard operates incorrect? Why are the two sequences different in the first three outputs, and then identical?

(This code was compiled on msvc 19.00.23918 and g++ 4.9.2 on cygwin, with identical results).

tompdavis
  • 115
  • 4

2 Answers2

0

The issue here seems to be that the engine is not being modified correctly or the distrobution is adding doing some extra work. If we use the engine directly like

int main()
{
    boost::mt19937 uGenOne(1);

    boost::mt19937 uGenTwo(1);
    uGenTwo.discard(1);

    unsigned int M = 10;
    std::vector<double> variatesOne(M);
    std::vector<double> variatesTwo(M);

    for (unsigned int m = 0; m < M; ++m) {
        variatesOne[m] = uGenOne();
        variatesTwo[m] = uGenTwo();
    }

    for (unsigned int m = 0; m < M; ++m)
        std::cout << std::left << std::setw(15) << variatesOne[m] << variatesTwo[m] << std::endl;

    return 0;
}

It produces

1.7911e+09     4.28288e+09
4.28288e+09    3.09377e+09
3.09377e+09    4.0053e+09
4.0053e+09     491263
491263         5.5029e+08
5.5029e+08     1.29851e+09
1.29851e+09    4.29085e+09
4.29085e+09    6.30312e+08
6.30312e+08    1.01399e+09
1.01399e+09    3.96591e+08

Which is a 1 shifted sequence as we expect since we discarded the first output.

So you are correct in how discard works. I am not exactly sure why there is a discrepancy when you do it through the boost::variate_generator. I cannot see why the first three numbers differ but all of the rest of the output matches.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Thanks @NathanOliver. The algorithm used for the default boost::normal_distribution is the Ziggurat algorithm, which is an acceptance/rejection algorithm [Wiki article on Ziggurat](https://en.wikipedia.org/wiki/Ziggurat_algorithm). This means that it can take an unknown number of draws. However, it still is curious to me that the sequences start agreeing after 3 normal draws... – tompdavis Aug 17 '16 at 18:05
  • The answer to the question "why do the sequences start to agree" is a bit subtle. Say the underlying generator gives (u1, u2, u3, ... ), and the shifted gives (u2, u3, u4, ...). We can't say how many draws are taken by the Ziggurat, however if after N normal draws, the total number of uniform draws becomes equal, the two series will be identical afterwards. This happens after 3 in the above case. – tompdavis Aug 17 '16 at 18:29
  • @tompdavis But wouldn't each Ziggurat take the same number of draws? Or is it possible that the random number generated affects the number of draws so even though they start off different they can sync back up depending on what values were produced? – NathanOliver Aug 17 '16 at 18:33
  • the Ziggurat draws a number, and does some manipulations, and if the resulting number is not in the distribution it will draw a new number. Suppose u1 produces a rejection, then it will use u2, which suppose produces an acceptance. However, since the second stream starts with u2, it will produce an acceptance. So in the first case the underlying generator was incremented twice, whereas the second was only incremented once. In this example the two distributions would produce identical sequences. – tompdavis Aug 17 '16 at 18:39
  • @tompdavis Awesome. I didn't know it worked that way. That information should be in a answer. Not sure if mine is the place for it. I can always make it a CW and you can add it in. – NathanOliver Aug 17 '16 at 18:42
0

Just to add on an important detail that was in the comments of the previous answer. As @NathanOliver mentioned, the .discard increments the generator, which sends uniforms to the normal distribution, which converts the uniforms to normal distribution. boost::normal_distribution uses the Ziggurat algorithm which is an "acceptance/rejection" type of algorithm. It draws a random uniform, manipulates it, and then checks to see if it is in the desired distribution. If not, it is rejected and a new random uniform.

for(;;) {
        std::pair<RealType, int> vals = generate_int_float_pair<RealType, 8>(eng);
        int i = vals.second;
        int sign = (i & 1) * 2 - 1;
        i = i >> 1;
        RealType x = vals.first * RealType(table_x[i]);
        if(x < table_x[i + 1]) return x * sign;
        if(i == 0) return generate_tail(eng) * sign;
        RealType y = RealType(table_y[i]) + uniform_01<RealType>()(eng) * RealType(table_y[i + 1] - table_y[i]);
        if (y < f(x)) return x * sign;
    }

The crucial point is that if the last if fails then the for loop is started again and the call to generate_int_float_pair will be triggered again. This means the number of times the underlying generator is incremented is not known.

The normal sequences will therefore have different numbers, until the point where the sum of rejections in the sub-sequence is the same, at which point the remaining uniform sequence is identical. This happened at the third position in the example posted in the question. (It actually is a bit more subtle because the underlying generator can be called once or twice in the Ziggurat algorithm, but the essence is the same - once the sequences get in sync they can never produce different variates).

tompdavis
  • 115
  • 4