Python question. I'm generating a large array of objects, which I only need to make a small random sample. Actually generating the objects in question takes a while, so I wonder if it would be possible to somehow skip over those objects that don't need generating and only explicitly create those objects that have been sampled.
In other words, I now have
a = createHugeArray()
s = random.sample(a,len(a)*0.001)
which is rather wasteful. I would prefer something more lazy like
a = createArrayGenerator()
s = random.sample(a,len(a)*0.001)
I don't know if this works. The documentation on random.sample isn't too clear, though it mentions xrange as being very fast - which makes me believe it might work. Converting the array creation to a generator would be a bit of work (my knowledge of generators is very rusty), so I want to know if this works in advance. :)
An alternative I can see is to make a random sample via xrange, and only generate those objects that are actually selected by index. That's not very clean though, because the indices generated are arbitrary and unnecessary, and I would need rather hacky logic to support this in my generateHugeArray method.
For bonus points: how does random.sample actually work? Especially, how does it work if it doesn't know the size of the population in advance, as with generators like xrange?