Random Sample of N Distinct Permutations of a List

Question

Suppose I have a Python list of arbitrary length k. Now, suppose I would like a random sample of n , (where n <= k!) distinct permutations of that list. I was tempted to try:

import random
import itertools

k = 6
n = 10

mylist = list(range(0, k))

j = random.sample(list(itertools.permutations(mylist)), n)

for i in j:
  print(i)

But, naturally, this code becomes unusably slow when k gets too large. Given that the number of permutations that I may be looking for n is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. Yet it's important that none of the permutations in the final list are duplicates.

How would you achieve this more efficiently? Remember, mylist could be a list of anything, I just used list(range(0, k)) for simplicity.

This problem reduces to being able to generate the nth lexicographical permutation. Once you are able to do this, you can create a mapping from the sequence `1 to n` (where n is the total number of permutations) to all of the permutations. Now we can easily generate a random sample, _S_, of the numbers `1 to n` and then individually generate the _ith_ permutation for all _i_ in _S_. — Joseph Wood, Jan 05 '19 at 13:19
Possible duplicate of [Generate sample of 1,000,000 random permutations](https://stackoverflow.com/questions/12884428/generate-sample-of-1-000-000-random-permutations) — Joseph Wood, Jan 05 '19 at 13:46

score 4 · Answer 1 · answered Jan 05 '19 at 09:01

You can generate permutations, and keep track of the ones you have already generated. To make it more versatile, I made a generator function:

import random

k = 6
n = 10

mylist = list(range(0, k))

def perm_generator(seq):
    seen = set()
    length = len(seq)
    while True:
        perm = tuple(random.sample(seq, length))
        if perm not in seen:
            seen.add(perm)
            yield perm

rand_perms = perm_generator(mylist)

j = [next(rand_perms) for _ in range(n)]

for i in j:
    print(i)

It seems we had the same idea. Good point to your version it relies only on PSL and use generator. — jlandercy, Jan 05 '19 at 09:10

jlandercy · Accepted Answer · 2019-01-05T17:33:12.190

Naïve implementation

Bellow the naïve implementation I did (well implemented by @Tomothy32, pure PSL using generator):

import numpy as np

mylist = np.array(mylist)
perms = set()
for i in range(n):                          # (1) Draw N samples from permutations Universe U (#U = k!)
    while True:                             # (2) Endless loop
        perm = np.random.permutation(k)     # (3) Generate a random permutation form U
        key = tuple(perm)
        if key not in perms:                # (4) Check if permutation already has been drawn (hash table)
            perms.update(key)               # (5) Insert into set
            break                           # (6) Break the endless loop
    print(i, mylist[perm])

It relies on numpy.random.permutation which randomly permute a sequence.

The key idea is:

to generate a new random permutation (index randomly permuted);
to check if permutation already exists and store it (as tuple of int because it must hash) to prevent duplicates;
Then to permute the original list using the index permutation.

This naïve version does not directly suffer to factorial complexity O(k!) of itertools.permutations function which does generate all k! permutations before sampling from it.

About Complexity

There is something interesting about the algorithm design and complexity...

If we want to be sure that the loop could end, we must enforce N <= k!, but it is not guaranteed. Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.

Limitation

Let's encapsulate the function written by @Tomothy32:

import math
def get_perms(seq, N=10):
    rand_perms = perm_generator(mylist)
    return [next(rand_perms) for _ in range(N)]

For instance, this call work for very small k<7:

get_perms(list(range(k)), math.factorial(k))

But will fail before O(k!) complexity (time and memory) when k grows because it boils down to randomly find a unique missing key when all other k!-1 keys have been found.

Always look on the bright side...

On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!. Example, it is possible to draw more than N=5000 tuples of length k where 10 < k < 1000 in less than one second.

When k and N are kept small and N<<<k!, then the algorithm seems to have a complexity:

Constant versus k;
Linear versus N.

This is somehow valuable.

@Tomothy32 you mean the average case, but there are hash table, so the worst case is `O(n)`, eg.: a single slot is filled with all values, unlikely but asymptotic. — jlandercy, Jan 05 '19 at 09:24
That's true, but then again the probability of that being the case is essentially zero. I'm just saying I don't think it's the bottleneck in this situation, generating the permutation is probably more costly (I believe it's `O(N)`). — iz_, Jan 05 '19 at 09:27
@Tomothy32 I have updated my answer, investigating the complexity. Would you mind to review it. Thank you for sharing. — jlandercy, Jan 05 '19 at 17:34

Random Sample of N Distinct Permutations of a List

2 Answers2

Naïve implementation

About Complexity

Limitation

Always look on the bright side...