10

I have two lists x and y, both of length n, with xi and yi forming a pair. How could I take a random sample of m values from these two lists while preserving the pairing information (e.g. x[10] and y[10] would be together in the resulting sample)

My initial idea is this.

  • use zip to create a list of tuples
  • shuffle the list of tuples
  • select the first m tuples from the list
  • break up the tuples into new paired lists

And the code would look something like this.

templist = list()
for tup in zip(x, y):
    templist.append(tup)
random.shuffle(templist)
x_sub = [a for a, b in templist[0:m]]
y_sub = [b for a, b in templist[0:m]]

This seems rather kludgy to me. Is there any way I could make this more clear, concise, or Pythonic?

Raymond Berg
  • 860
  • 5
  • 17
Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • Have you looked at [`random.sample`](https://docs.python.org/2/library/random.html#random.sample)? – metatoaster Sep 15 '15 at 02:50
  • @metatoaster As a replacement for the shuffle command? The whole solution would still be a bit kludgy. Unless `random.sample` could take two paired lists as input. – Daniel Standage Sep 15 '15 at 02:53

4 Answers4

14

You can sample m pairs and split those into two lists with the following code:

import random

x = list(range(1, 10))
y = list("abcdefghij")
m = 3

x_sub, y_sub = zip(*random.sample(list(zip(x, y)), m))
iron9
  • 397
  • 2
  • 12
Jerry Day
  • 366
  • 1
  • 5
1

If you have two lists with elements that are direct pairs of each other and simply zip them (and in python 3, cast that object into a list), then use random.sample to take a sample.

>>> m = 4
>>> x = list(range(0, 3000, 3))
>>> y = list(range(0, 2000, 2))
>>> random.sample(list(zip(x, y)), m)
[(2145, 1430), (2961, 1974), (9, 6), (1767, 1178)]
metatoaster
  • 17,419
  • 5
  • 55
  • 66
1

If you have two lists of the same dimensions, you just want to sample a subset of these elements and pair the results.

x = [1,2,3,4,5] 
y = [6,7,8,9,10]
sample_size = 3
idx = np.random.choice(len(x), size=sample_size, replace=False)
pairs = [(x[n], y[n]) for n in idx]
>>> pairs
[(5, 10), (2, 7), (1, 6)]
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • when I ran the code, it shows "SyntaxError: closing parenthesis ')' does not match opening parenthesis '['. I've tried to adjust it but not successful. Can you have a look, please? – Mark K Oct 11 '20 at 15:07
  • 1
    @MarkK Corrected above. – Alexander Oct 11 '20 at 17:15
0

You can implement the random_product itertools recipe. I will use a third-party library, more_itertools, that implements this recipe for us. Install this library via pip install more_itertools.

Code

import more_itertool as mit


x, y, m = "abcdefgh", range(10), 2

iterable = mit.random_product(x, y, repeat=m) 

Results

iterable
# ('e', 9, 'f', 3)

It is not clear in what form the OP wants the results, but you can group x and y together, e.g. [(x[0], y[0]), (x[1], y[1]), ...]:

paired_xy = list(zip(*[iter(iterable)]*2))
paired_xy
# [('e', 9), ('f', 3)]

See also more_itertools.sliced and more_itertools.grouper for grouping consecutive items.

Alternatively, you may zip further to group along x and y, e.g. [(x[0], x[1], ...), (y[0], y[1], ...)]:

paired_xx = list(zip(*paired_xy))
paired_xx
# [('e', 'f'), (9, 3)]

Note, this approach accepts any number of iterables, x, y, z, etc.

# Select m random items from multiples iterables, REF 101
x, y, m = "abcdefgh", range(10), 2
a, b, c = "ABCDE", range(10, 100, 10), [False, True]
iterable = mit.random_product(x, y, a, b, c, repeat=m) 
iterable
# ('d', 6, 'E', 80, True, 'a', 1, 'D', 50, False)

Details

From the itertools recipes:

def random_product(*args, repeat=1):
    "Random selection from itertools.product(*args, **kwds)"
    pools = [tuple(pool) for pool in args] * repeat
    return tuple(random.choice(pool) for pool in pools)

We can see the function indeed accepts multiple arguments which each become a collection of pools. The size of the pool scales by the value of repeat keyword. A random selection is made from each pool and tupled together as the final result.

See also more_itertools docs for more tools.

pylang
  • 40,867
  • 14
  • 129
  • 121