14

I'm using a random.seed() to try and keep the random.sample() the same as I sample more values from a list and at some point the numbers change.....where I thought the one purpose of the seed() function was to keep the numbers the same.

Heres a test I did to prove it doesn't keep the same numbers.

import random

a=range(0,100)
random.seed(1)
a = random.sample(a,10)
print a

then change the sample much higher and the sequence will change(at least for me they always do):

a = random.sample(a,40)
print a

I'm sort of a newb so maybe this is an easy fix but I would appreciate any help on this. Thanks!

Motion4D
  • 153
  • 1
  • 1
  • 5
  • Can you give a sample output (from print a) for what you get and what you are expecting? Your question is a bit vague (the numbers change?) but it sounds like this function is working like I would expect it to. – Paul Seeb Apr 14 '14 at 17:22
  • It's worth noting that a subsequence of a random sample is also a random sample itself. So, you should probably just grab the 40-element sample up front, and make the 10-element one with a slice. – Blckknght Apr 14 '14 at 17:26
  • @PaulSeeb yes, sorry it was a little vague. I should have explained my end goal a bit more which is to go from 0 samples to the full 100 in a random order without repeats. As my sample count increases the sequence changes at some point. When I do the sample with 10 my sequence starts as [13,84,76,25...] and the one with 40 gives me [13,83,74,24...] it seems some numbers stay the same and other change at some point which seem odd to me. – Motion4D Apr 14 '14 at 19:09

4 Answers4

9

If you were to draw independent samples from the generator, what would happen would be exactly what you're expecting:

In [1]: import random

In [2]: random.seed(1)

In [3]: [random.randint(0, 99) for _ in range(10)]
Out[3]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2]

In [4]: random.seed(1)

In [5]: [random.randint(0, 99) for _ in range(40)]
Out[5]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43 ...]

As you can see, the first ten numbers are indeed the same.

It is the fact that random.sample() is drawing samples without replacement that's getting in the way. To understand how these algorithms work, see Reservoir Sampling. In essence what happens is that later samples can push earlier samples out of the result set.

One alternative might be to shuffle a list of indices and then take either 10 or 40 first elements:

In [1]: import random

In [2]: a = range(0,100)

In [3]: random.shuffle(a)

In [4]: a[:10]
Out[4]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80]

In [5]: a[:40]
Out[5]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80, ...]
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Thanks, I used your second option which worked just fine, although its a bummer the random.sample wont count all the way up without switching. Your first option was nice too except it gave repeats. Thanks again for your help!! – Motion4D Apr 14 '14 at 19:25
  • `random.sample` doesn't use reservoir sampling. It does either a partial shuffle or rejection sampling, depending on how the sample size compares to the population size. – user2357112 May 13 '20 at 17:24
4

It seems that random.sample is deterministic only if the seed and sample size are kept constant. In other words, even if you reset the seed, generating a sample with a different length is not "the same" random operation, and may give a different initial subsequence than generating a smaller sample with the same seed. In other words, the same random numbers are being generated internally, but the way sample uses them to derive the random sequence is different depending on how large a sample you ask for.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
3

You are assuming an implementation of random.sample something like this:

def samples(lst, k):
    n = len(lst)
    indices = []
    while len(indices) < k:
        index = random.randrange(n)
        if index not in indices:
            indices.append(index)
    return [lst[i] for i in indices]

Which gives:

>>> random.seed(1)
>>> samples(list(range(20)), 5)
[4, 18, 2, 8, 3]
>>> random.seed(1)
>>> samples(list(range(20)), 10)
[4, 18, 2, 8, 3, 15, 14, 12, 6, 0]

However, that isn't how random.sample is actually implemented; seed does work how you think, it's sample that doesn't!

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
0

You simply need to re-seed it:

a = list(range(100))
random.seed(1)  # seed first time
random.sample(a, 10)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83]
random.seed(1)  # seed second time with same value
random.sample(a, 40)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83, 48, 26, 12, 62, 3, 49, 55, 77, 0, 92, 34, 29, 75, 13, 40, 85, 2, 74, 69, 1, 89, 27, 54, 98, 28, 56, 93, 35, 14, 22]

But in your case you're using a generator, not a list, so after sampling the first time a will shrink (from 100 to 90), and you will lose the elements that you had sampled, so it won't work. So just use a list and seed before every sampling.

Alaa M.
  • 4,961
  • 10
  • 54
  • 95