Random number generator (RNG/PRNG) that returns updated value of seed

Question

I'm trying to write an RNG that also returns the value of the updated seed. The perhaps obvious reason for this is so that new random variables can be added to the program later, without changing the values of the existing RNGs. For a python/numpy version of this issue see for example: Difference between RandomState and seed in numpy

Here's a example of usage with a (tentative) proposed solution:

program main

   ! note that I am assuming size=12 for the random 
   ! seed but this is implementation specific and
   ! could be different in your implementation
   ! (you can query this with random_seed(size) btw)

   integer :: iseed1(12) = 123456789
   integer :: iseed2(12) = 987654321

   do i = 1, 2
      x = ran(iseed1)
      y = ran(iseed2)
   end do

end program main

function ran( seed )

   integer, intent(inout) :: seed(12)
   real                   :: ran

   call random_seed( put=seed )
   call random_number( ran )
   call random_seed( get=seed )

end function ran

Note that a vital aspect of this solution (and any other solutions) is that if we added seed3 and x3 to the above, there would be no change to the realized values of x1 and x2. Similarly, either x1 or x2 could be deleted from the code without affecting the values of the other one.

If it helps, this how the ran() (extension) function works on Compaq/HP and Intel compilers and I'm basically trying to implement that same API on gfortran. Note, however, that the seed for that function is a scalar whereas it is a one-dimensional array using the Fortran 90 subroutine random_seed (with the size/length of the array being implementation specific).

I am providing my current solution as a benchmark error but hope others can either critique that answer or provide better ones.

I would appreciate any analysis of the benchmark results according to standard PRNG tests and in particular, analysis of how the seed is set. In the benchmark I am merely using a scalar broadcast to an array which is very simple and concise and would like to avoid explicitly providing an entire array of integers.

So I would like either a somewhat rigorous confirmation that this is fine or else a more concise method of setting the seed (in a repeatable way!) than writing out an entire array of, say, 12 or 33 integers. E.g. maybe there is some simple and concise way to produce a nice stream of 12 pseudo-random integers from a single integer (as the array length is so short, this is probably easy?).

Edit to add: Followup question about setting seeds in fortran: Correctly setting random seeds for repeatability

Anyway, if the desired result is just to have separate random number sequences, you can do that with parallel PRNGs that maintain separate sequences for different threads. But nothing forces you to use those sequences with threads, you can use them any way you want. Example: https://bitbucket.org/LadaF/elmm/src/4772e49e2bc91683e8202eaa9ebf8a30a7a6ea0e/src/rng_par_zig.f90?at=master&fileviewer=file-view-default — Vladimir F Героям слава, Aug 15 '18 at 14:57
@VladimirF Thanks, that is conceptually a good suggestion for sure, though a bit more complicated than my ideal solution. I mean, I'm not really asking that tough of a question I don't think and if no one sees anything horribly wrong with my benchmark answer, then I'll just use that in the near term at least. — JohnE, Aug 15 '18 at 19:13
@JohnE, you should use a subroutine instead of a function. A function returns a value. Anything else it does is a side-effect, and this may have unintended consequences. — Steve, Aug 15 '18 at 19:21
@Steve I guess I don't see how this is any different than if I passed the seed via a subroutine with `intent(inout)`? Or is `intent(inout)` also frowned upon nowadays? I suppose the state of the f90 PRNG is also being changed as a side effect, but no more or less than when the PRNG subroutines are called themselves. — JohnE, Aug 16 '18 at 08:56
There's no requirement that a function shouldn't modify its arguments, so `seed` being `inout` is technically correct. Stylistically, I prefer functions that only modify their output, though. — Ross, Aug 16 '18 at 15:55
@Steve, *etc*: the Fortran random number functions are fundamentally impure, they maintain state behind the scenes. Exposing that state, as OP intends, doesn't make them more impure. Indeed, a function which takes in a state and spits out both the updated state and the next number in the sequence might actually be pure. — High Performance Mark, Aug 17 '18 at 14:57
@High Performance Mark, never claimed that random_seed or random_number were pure. Don't know where you got that idea. I will note that both are intrinsic subroutines not intrinsic functions as J3 avoids defining intrinsic functions with side-effects. — Steve, Aug 17 '18 at 17:24
@JohnE, Functions with side-effects can have some very bad consequences. There is a very long and recent discussion about this in comp.lang.fortran newsgroup. Consider `if (.false. .and. impure_func())`, a Fortran compiler can generate code where `impure_func()` is not evaluated. If the code depends on a side-effect of calling `impure_func()`, you have a problem (which may be hard to debug). Best practice: don't write functions with side-effects. — Steve, Aug 17 '18 at 17:28

Ross · Accepted Answer · 2018-08-16T16:37:22.383

Your proposed solution looks to me like it should work - you are recording the entire state of the generator (via get), and swapping between streams when necessary (via put). Note that I haven't actually tested your code, though.

This answer came about because a previous answer (now deleted) saved only the first element of the state array, and was using it to set the entire state array. It looked something like:

integer :: seed_scalar, state_array(12)
...
! -- Generate a random number from a thread that is stored as seed_scalar
state_array(:) = seed_scalar
call random_seed( put=state_array )
call random_number( ran )
call random_seed( get=state_array )
seed_scalar = state_array(1)
! -- discard state_array, store seed_scalar

This answer is intended to demonstrate that, for some compilers (gfortran 4.8 and 8.1, pgfortran 15.10), this method of maintaining separate threads via only a scalar results in bad RNG behavior.

Consider the following code to primitively test a random number generator. Many random numbers are generated - 100M for this example - and performance is monitored in two ways. First, counts for when the random number increases or decreases are recorded. Second, a histogram with bin width 0.01 is generated. (This is obviously a primitive test case, but it turns out to be sufficient to prove the point.). Finally, the estimated one-sigma standard deviation for all probabilities is also generated. This allows us to determine when variations are random or statistically significant.

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer :: N_iterations, state_size, N_histogram
   integer :: count_incr, count_decr, i, loc, fid
   integer, allocatable, dimension(:) :: state1, histogram
   real(wp) :: ran, oldran, p, dp, re, rt

   ! -- Some parameters
   N_iterations = 100000000
   N_histogram = 100

   call random_seed(size = state_size)
   allocate(state1(state_size), histogram(N_histogram))
   write(*,'(a,i3)') '-- State size is: ', state_size

   ! -- Initialize RNG, taken as today's date (also tested with other initial seeds)
   state1(:) = 20180815
   call random_seed(put=state1)

   count_incr = 0
   count_decr = 0
   histogram = 0
   ran = 0.5_wp      ! -- To yield proprer oldran

   ! -- Loop to grab a batch of random numbers
   do i=1,N_iterations
      oldran = ran

      ! -- This preprocessor block modifies the RNG state in a manner
      ! -- consistent with storing only the first value of it
#ifdef ALTER_STATE
      call random_seed(get=state1)
      state1(:) = state1(1)
      call random_seed(put=state1)
#endif

      ! -- Get the random number
      call random_number(ran)

      ! -- Process Histogram
      loc = CEILING(ran*N_histogram)
      histogram(loc) = histogram(loc) + 1

      if (oldran<ran) then
         count_incr = count_incr + 1
      elseif (oldran>ran) then
         count_decr = count_decr + 1
      else
         ! -- This should be statistically impossible
         write(*,*) '** Error, same value?', ran, oldran
         stop 1
      endif
   enddo

   ! -- For this processing, we have:
   !     re    Number of times this event occurred
   !     rt    Total number
   ! -- Probability of event is just re/rt
   ! -- One-sigma standard deviation is sqrt( re*(rt-re)/rt**3 )

   ! -- Write histogram
   ! -- For each bin, compute probability and uncertainty in that probability
   open(newunit=fid, file='histogram.dat', action='write', status='replace')
   write(fid,'(a)') '# i, P(i), dP(i)'

   rt = real(N_iterations,wp)
   do i=1,N_histogram
      re = real(histogram(i),wp)
      p = re/rt
      dp = sqrt(re*(rt-1)/rt**3)

      write(fid,'(i4,2es26.18)') i, p, dp
   enddo
   close(fid)

   ! -- Probability of increase and decrease
   re = real(count_incr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of increasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   re = real(count_decr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of decreasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   write(*,'(a)') 'complete'

end program main

Without Modification

Without the preprocessor directive ALTER_STATE, we are using gfortran's built-in PRNG as intended, and the results are as expected:

enet-mach5% gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

enet-mach5% gfortran -cpp -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.499970710000000
      one-sigma deviation:    0.000070708606619
Probability of decreasing:    0.500029290000000
      one-sigma deviation:    0.000070712748851
complete

real    0m2.414s
user    0m2.408s
sys     0m0.004s

The expected probability for increasing/decreasing is 0.5, and both are with the estimated uncertainty (0.49997 is less than 0.00007 from 0.5). The histogram, plotted with error bars, is

For each bin, the variation from the expected probability (0.01) is small, and typically within the estimated uncertainty. Because we have generated many numbers, all variations are small (order 0.1%). Basically, this test hasn't found any suspicious behavior.

With Modification

If we enable the block inside ALTER_STATE, we are modifying the internal state of the random number generator each time a number is generated. This is intended to mimic a now-deleted solution, which stored only the first value of the state. The results are:

enet-mach5% gfortran -cpp -DALTER_STATE -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.501831930000000
      one-sigma deviation:    0.000070840096343
Probability of decreasing:    0.498168070000000
      one-sigma deviation:    0.000070581021884
complete

real    0m16.489s
user    0m16.492s
sys     0m0.000s

The observed probability of increasing is far outside the expected variation (26 sigma!). This already indicates something is wrong. The histogram is:

Note that the range in y has changed substantially. Here, we have a variation of approximately two orders of magnitude larger than the previous case, far outside of the expected variation. The error bars are hard to see, here, because the y-range is so much larger. If my random number generator performed like this, I wouldn't feel comfortable using it for anything, not even a coin flip.

Closing

The put and get options of random_seed access the processor-dependent internal state of the random number generator. This typically has more entropy than a single number, and its form is processor-dependent. There's no guarantee that the first number is at all representative of the entire state.

If you're initializing a random seed once, and generating many times, using a single scalar is fine. However, it's clearly necessary to store more than a single number if you intend to use that state to generate every single number.

I'm frankly a little bit surprised this primitive test was able to demonstrate the bad behavior. Validity of RNG is a complex subject, and I am by no means an expert. The results were also compiler-dependent:

The results and histograms shown are for gfortran 4.8, which has a state size of 12.
Intel 16.0 uses a state size of 2, and this test didn't show any undesirable behavior.
gfortran 8.1.0 has a state size of 33, and PGI 15.10 has a state size of 34. Their behavior was the same, and the worst yet. When setting the entire state to a single value, subsequent random numbers are always identical. These generators require a 'priming' of around 30 generations before the numbers start looking reasonable, when initialized from a a single seed. When saving only a single seed in the state, this priming never occurs.

It's possible that a larger state size results in more entropy loss when saving only a single value, so it's correlated with poorer behavior. That certainly fits with my observations. Without knowing the internals of each generator, though, it's impossible to tell.

there is a flaw in your testing. Not all seeds are created equate. state1(1) may not be involved in the internal stated of the prng. Initialize the generator, draw 1 or more random numbers, then compare the old seeds with the new state. — Steve, Aug 15 '18 at 23:15
I think what you're saying is exactly the point I'm trying to make. There's no sense in ever dealing only with state1(1). When I get the chance, I'll edit the question to make it more clear what motivated the answer, now that the old answer has been deleted. — Ross, Aug 15 '18 at 23:17
Thanks Ross, this is great! If you want, feel free to delete the stuff focusing on the old answer and just focus on the newer answer. The old answer is probably just confusing at this point. Also feel free to expand on your comments about Intel 16.0 if you think they are interesting. — JohnE, Aug 16 '18 at 08:10
@Ross, yes, your conclusion drives home that care and testing is required if one is (partially) manipulating the internal state of the prng — Steve, Aug 16 '18 at 22:08

Random number generator (RNG/PRNG) that returns updated value of seed

1 Answers1

Without Modification

With Modification

Closing

Linked