10

I need to sample uniformly at random a number from a set with fixed size, do some calculation, and put the new number back into the set. (The number samples needed is very large)

I've tried to store the numbers in a list and use random.choice() to pick an element, remove it, and then append the new element. But that's way too slow!

I'm thinking to store the numbers in a numpy array, sample a list of indices, and for each index perform the calculation.

  • Are there any faster way of doing this process?
user972432
  • 317
  • 2
  • 8
  • Are you partitioning your collection into two pieces? Those that get processed (a fixed size) and those that are not processed? Why are you "replacing"? Why not build a new collection from the two sub-collections? 'a= (f(x) for x in S[:limit]) + (x for x in s[limit:])` If `s` is shuffled, this should work, right? Why do "replacement" into a list? – S.Lott Oct 19 '11 at 03:02
  • The calculation on each element depends on other elements on the list, I don't know of any ways to vectorize such a process. – user972432 Oct 19 '11 at 03:33
  • "calculation on each element depends on other elements on the list"? Please explain that, too. Depending on other elements does not force you into a replacement-style process. Please provide the code you're using. – S.Lott Oct 19 '11 at 09:51

3 Answers3

7

Python lists are implemented internally as arrays (like Java ArrayLists, C++ std::vectors, etc.), so removing an element from the middle is relatively slow: all subsequent elements have to be reindexed. (See http://www.laurentluce.com/posts/python-list-implementation/ for more on this.) Since the order of elements doesn't seem to be relevant to you, I'd recommend you just use random.randint(0, len(L) - 1) to choose an index i, then use L[i] = calculation(L[i]) to update the ith element.

ruakh
  • 175,680
  • 26
  • 273
  • 307
4

I need to sample uniformly at random a number from a set with fixed size, do some calculation, and put the new number back into the set.

s = list(someset)           # store the set as a list
while 1:
    i = randrange(len(s))   # choose a random element
    x = s[i]
    y = your_calculation(x) # do some calculation
    s[i] = y                # put the new number back into the set
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 1
    Why isn't this `s[i] = your_calculation( s[i] )`? Why all the separate assignment statements? – S.Lott Oct 19 '11 at 11:47
  • For clarity, so the OP can see clearly that each clause in his problem specification corresponds with a line of code that implements that clause. – Raymond Hettinger Oct 29 '11 at 17:49
2

random.sample( a set or list or Numpy array, Nsample ) is very fast, but it's not clear to me if you want anything like this:

import random

Setsize = 10000
Samplesize = 100
Max = 1 << 20
bigset = set( random.sample( xrange(Max), Setsize ))  # initial subset of 0 .. Max

def calc( aset ):
    return set( x + 1 for x in aset )  # << your code here

    # sample, calc a new subset of bigset, add it --
for iter in range(3):
    asample = random.sample( bigset, Samplesize )
    newset = calc( asample )  # new subset of 0 .. Max
    bigset |= newset

You could use Numpy arrays or bitarray instead of set, but I'd expect the time in calc() to dominate.

What are your Setsize and Samplesize, roughly ?

denis
  • 21,378
  • 10
  • 65
  • 88