5

I want to shuffle a long sequence (say it is has more than 10000 elements)a lot of times (say 10000). When reading Python Random documentation, I found the following:

Note that even for small len(x), the total number of permutations of x can quickly grow larger than the period of most random number generators. This implies that most permutations of a long sequence can never be generated. For example, a sequence of length 2080 is the largest that can fit within the period of the Mersenne Twister random number generator

I have two groups (could be more) and each has many values. The sequence I want to shuffle is the list of all values available regardless of the group. My concern is that the note implies that the shuffle I need may not be provided by the random.shuffle() function.

I have thought about some workarounds:

  • Initialize the random number generator (with random.seed()) several in certain iterations. That way, it does not matter if the permutations are more than the period because different seeds will get different results.
  • Use sample(range(length of sequence), k=size of a group) to get random indices an then use those to index within each group. That way I may not run out of permutations due to the period of the random number generator.

Would any of my alternatives help?

Thanks a lot!

srcolinas
  • 497
  • 5
  • 13
  • I disagree @sasha: if not all permutations can be reached with equal probability, the shuffle won't be fully random - there will be a measure of predictability that could be exploited. – Reblochon Masque Oct 21 '17 at 01:40
  • @OP, you may want to look into the `crypto` module to see if it offers an alternative that you could use? – Reblochon Masque Oct 21 '17 at 01:45
  • [Valuable read](https://cs.stackexchange.com/questions/9199/best-random-permutation-employing-only-one-random-number). – sascha Oct 21 '17 at 01:56
  • I have seen the shuffle function in the Random package. I am now even more confused. I do not know if I should be concerned at al https://github.com/python/cpython/blob/3.6/Lib/random.py – srcolinas Oct 21 '17 at 14:37

2 Answers2

2

Well 10,000! ~= 10^36,000 That is a lot of possible permutations. The best you could do is to delve into how your operating system or hardware accumulates "truly random" bits. You could then wait for ~120,000 bits of randomness that you are OK with then use the algorithm that generates the n'th permutation of your input list given that random n.

Paddy3118
  • 4,704
  • 27
  • 38
  • the problem here would be the order in which permutations occur and its amount. To select as you say, I think I would need to store all possible permutations, which is impossible. – srcolinas Oct 21 '17 at 14:23
  • 1
    No, there is a distinct algorithm where you give it a list, and the "index" of the permutation needed and it spits it out without generating anywhere like all the permutations . I have used it and posted it before but It would take ten minutes to find again... – Paddy3118 Oct 22 '17 at 09:10
  • 1
    ...Ah, [here](http://rosettacode.org/wiki/Permutations/Rank_of_a_permutation) is the Rosetta Code task I started; and [here](https://stackoverflow.com/a/13056801/10562) I answer a similar task on SO. – Paddy3118 Oct 22 '17 at 09:20
-2

You can use numpy shuffle function to shuffle the list elements in-place

import numpy as np

L = range(0, 10000)
np.random.shuffle(L)

Timing the shuffle call (in Jupyter)

%timeit np.random.shuffle(L)

you get

10000 loops, best of 3: 182 µs per loop
jadarve
  • 309
  • 3
  • 7