0

We have to positive integers b,n with b < n/2. We want to generate two random disjoint lists I1, I2 both with b elements from {0,1,...,n}. A simple way to do this is the following.

def disjoint_sets(bound,n):
    import random
    I1=[];I2=[];
    L = random.sample(range(0,n+1), n+1)
    I1 = L[0:bound]
    I2 = L[bound:2*bound]
    return I1,I2

For large b,n (say b=100, n>1e7) the previous is not memory efficient. Since L is large. I am wondering if there is a method to get I1,I2 without using range(0,n+1)?

111
  • 133
  • 6
  • But your code snippet suggests that you already understand the point that you can just take a single sample of size `2*b` and then break it into two halves. Focusing on one vs two lists has no real relevance. – John Coleman Nov 02 '19 at 19:08
  • Right. I don't want to keep in memory the list {0,1,..,n}. – 111 Nov 02 '19 at 19:09
  • If `b` is small compared to `n` -- a naive rejection sampling approach would be adequate. It might help if you gave typical values of `b` and `n`. – John Coleman Nov 02 '19 at 19:16
  • I edited my question; say `b close 100` and `n>1e7.` But, yes we can assume `b<< n`. – 111 Nov 02 '19 at 19:19

1 Answers1

1

Here is a hit-and-miss approach which works well for numbers in the range that you mentioned:

import random

def rand_sample(k,n):
    #picks k distinct random integers from range(n)
    #assumes that k is much smaller than n
    choices = set()
    sample = []
    for i in range(k): #xrange(k) in Python 2
        choice = random.randint(0,n-1)
        while choice in choices:
            choice = random.randint(0,n-1)
        choices.add(choice)
        sample.append(choice)
    return sample

For your problem, you could do something like:

def rand_pair(b,n):
    sample = rand_sample(2*b,n)
    return sample[:b],sample[b:]
John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • so you suggest to take `I1,I2=rand_sample(k,n),rand_sample(k,n)` ? For `k< – 111 Nov 02 '19 at 19:53
  • @111 I suggest taking `rand_sample(2*b,n)` and just slicing the sample into two parts. Just get the sample. If you want to later divide your sample into two piles, no problem. See the edit. – John Coleman Nov 02 '19 at 20:04
  • Ok. Got it. So this become inefficient (for time complexity) if `2b` is close to `n.` – 111 Nov 02 '19 at 20:09
  • It would become inefficient probably before then. I haven't analyzed it much. Below the square root of `n`, the probability of collisions is small and the hit and miss approach is mostly hitting. Once `2*b` exceeds `n/2`, you will be missing more than hitting, plus all of the book-keeping involved. Somewhere between those two thresholds is when it stops being a good approach. – John Coleman Nov 02 '19 at 20:20
  • Yes, from birthday paradox `sqrt{n}` is the threshold for effciency. +1 – 111 Nov 02 '19 at 20:25