Fast generation of random derangements

Question

I am looking to generate derangements uniformly at random. In other words: shuffle a vector so that no element stays in its original place.

Requirements:

uniform sampling (each derangement is generated with equal probability)
a practical implementation is faster than the rejection method (i.e. keep generating random permutations until we find a derangement)

None of the answers I found so far are satisfactory in that they either don't sample uniformly (or fail to prove uniformity) or do not make a practical comparison with the rejection method. About 1/e = 37% of permutations are derangements, which gives a clue about what performance one might expect at best relative to the rejection method.

The only reference I found which makes a practical comparison is in this thesis which benchmarks 7.76 s for their proposed algorithm vs 8.25 s for the rejection method (see page 73). That's a speedup by a factor of only 1.06. I am wondering if something significantly better (> 1.5) is possible.

I could implement and verify various algorithms proposed in papers, and benchmark them. Doing this correctly would take quite a bit of time. I am hoping that someone has done it, and can give me a reference.

Have you looked into the [Fisher-Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)? — dbush, Feb 07 '20 at 14:03
@dbush That is an algorithm for sampling all permutations uniformly, not for sampling derangements. It is what one would use as part of the "rejection method" I referred to in the question. — Szabolcs, Feb 07 '20 at 14:06
i can think of one algo but it needs a helper buffer. are you looking for a no-extra buffer algo? — acegs, Feb 07 '20 at 15:00
@acegs I _am_ interested in an algorithm that constructs the shuffled vector in a new buffer. That one should be benchmarked against a Fisher-Yates that also uses a new buffer. — Szabolcs, Feb 07 '20 at 15:27
The best thing may actually be a variant of the rejection method that restarts generation as soon as it's clear that the result would not be a derangement. This is described here: https://stackoverflow.com/a/25238398/695132 — Szabolcs, Feb 07 '20 at 15:30
I wonder if it's worth thinking about building a bipartite graph where a_i and b_j are connected for all i != j, assigning random weights, and then finding the minimum weight matching. It kind of self-evidently should produce all derangements with equal probability. Of course the Hungarian Algorithm and its kin are not fast. But maybe there's a speedup for a graph with this kind of symmetry? Just a thought... — Gene, Feb 08 '20 at 03:29

btilly · Answer 1 · 2020-02-07T18:50:16.813

Here is an idea for an algorithm that may work for you. Generate the derangement in cycle notation. So (1 2) (3 4 5) represents the derangement 2 1 4 5 3. (That is (1 2) is a cycle and so is (3 4 5).)

Put the first element in the first place (in cycle notation you can always do this) and take a random permutation of the rest. Now we just need to find out where the parentheses go for the cycle lengths.

As https://mathoverflow.net/questions/130457/the-distribution-of-cycle-length-in-random-derangement notes, in a permutation, a random cycle is uniformly distributed in length. They are not randomly distributed in derangements. But the number of derangements of length m is m!/e rounded up for even m and down for odd m. So what we can do is pick a length uniformly distributed in the range 2..n and accept it with the probability that the remaining elements would, proceeding randomly, be a derangement. This cycle length will be correctly distributed. And then once we have the first cycle length, we repeat for the next until we are done.

The procedure done the way I described is simpler to implement but mathematically equivalent to taking a random derangement (by rejection), and writing down the first cycle only. Then repeating. It is therefore possible to prove that this produces all derangements with equal probability.

With this approach done naively, we will be taking an average of 3 rolls before accepting a length. However we then cut the problem in half on average. So the number of random numbers we need to generate for placing the parentheses is O(log(n)). Compared with the O(n) random numbers for constructing the permutation, this is a rounding error. However it can be optimized by noting that the highest probability for accepting is 0.5. So if we accept with twice the probability of randomly getting a derangement if we proceeded, our ratios will still be correct and we get rid of most of our rejections of cycle lengths.

If most of the time is spent in the random number generator, for large n this should run at approximately 3x the rate of the rejection method. In practice it won't be as good because switching from one representation to another is not actually free. But you should get speedups of the order of magnitude that you wanted.

"If most of the time is spent in the random number generator" <- Actually that may very well be true in practice ... — Szabolcs, Feb 07 '20 at 17:30
However, this still starts with "Put the first element in the first place _and take a random permutation of the rest_", i.e. creating a random permutation, which requires O(n) random numbers, is part of the algorithm. It is only for "placing the parentheses" that we need `O(log(n))`. Or did I misunderstand? — Szabolcs, Feb 07 '20 at 18:04
@Szabolcs Yes, that is correct. That is what I was trying to say with, "*So the number of random numbers we need to generate for placing the parentheses is `O(log(n))`. Compared with the `O(n)` random numbers for constructing the permutation, this is a rounding error.*" — btilly, Feb 07 '20 at 18:49

score 0 · Answer 2 · answered Feb 07 '20 at 15:45

this is just an idea but i think it can produce a uniformly distributed derangements. but you need a helper buffer with max of around N/2 elements where N is the size of the items to be arranged.

first is to choose a random(1,N) position for value 1.
- note: 1 to N instead of 0 to N-1 for simplicity.
then for value 2, position will be random(1,N-1) if 1 fall on position 2 and random(1,N-2) otherwise.
the algo will walk the list and count only the not-yet-used position until it reach the chosen random position for value 2, of course the position 2 will be skipped.
for value 3 the algo will check if position 3 is already used. if used, pos3 = random(1,N-2), if not, pos3 = random(1,N-3)
again, the algo will walk the list and count only the not-yet-used position until reach the count=pos3. and then position the value 3 there.
this will goes for the next values until totally placed all the values in positions.

and that will generate a uniform probability derangements.

the optimization will be focused on how the algo will reach pos# fast. instead of walking the list to count the not-yet-used positions, the algo can used a somewhat heap like searching for the positions not yet used instead of counting and checking positions 1 by 1. or any other methods aside from heap-like searching. this is a separate problem to be solved: how to reached an unused item given it's position-count in a list of unused-items.

oops, 1 thing 1 haven't considered. in case the randomizer haven't generated the Nth position within the N-1 placing process, the item N will be placed in posN which is not a derangement. — acegs, Feb 07 '20 at 15:52
Are you sure there is such a data structure that would be able to do both insert (or delete), as well as the lookup, in `O(1)` average time? Because otherwise this algorithm has worse average time complexity than the rejection method with Fisher-Yates. — walnut, Feb 07 '20 at 16:45
i haven't said anything about using a specific searching algo. the 1st goal is to produce the uniform probability. and then other algo that will help achieving the optimization is a different module/helper-algo. we may not know for now if someday, someone will discover a helper-algo that is faster. it is just not known as existing now, maybe. — acegs, Feb 08 '20 at 01:05

score 0 · Answer 3 · answered Feb 07 '20 at 16:59

0

I'm curious ... and mathematically uninformed. So I ask innocently, why wouldn't a "simple shuffle" be sufficient?

for i from array_size downto 1:  # assume zero-based arrays
  j = random(0,i-1)
    swap_elements(i,j)

Since the random function will never produce a value equal to i it will never leave an element where it started. Every element will be moved "somewhere else."

answered Feb 07 '20 at 16:59

Mike Robinson

8,490
5
28
41

This was suggested by a now-deleted answer. My requirements are that derangements are sampled uniformly. This means that the algorithm should produce each possible result with the same probability (which your proposal satisfies) and that it can generate all derangements (which it does not satisfy). It cannot generate e.g. `(2, 1, 4, 3)` from `(1 2 3 4)`. – Szabolcs Feb 07 '20 at 17:25
There are many posts online on the topic, and lots of suggestions like this. Typically, it takes quite some time to determine if a suggestion is correct or not. This is why I asked for a proof (or more realistically, a reference to one). – Szabolcs Feb 07 '20 at 17:28

Dave · Answer 4 · 2020-02-08T00:24:13.280

Let d(n) be the number of derangements of an array A of length n.

d(n) = (n-1) * (d(n-1) + d(n-2))

The d(n) arrangements are achieved by:

1. First, swapping A[0] with one of the remaining n-1 elements
2. Next, either deranging all n-1 remaning elements, or deranging 
   the n-2 remaining that excludes the index 
   that received A[0] from the initial matrix.

How can we generate a derangement uniformly at random?

1. Perform the swap of step 1 above.
2. Randomly decide which path we're taking in step 2,
   with probability d(n-1)/(d(n-1)+d(n-2)) of deranging all remaining elements.
3. Recurse down to derangements of size 2-3 which are both precomputed.

Wikipedia has d(n) = floor(n!/e + 0.5) (exactly). You can use this to calculate the probability of step 2 exactly in constant time for small n. For larger n the factorial can be slow, but all you need is the ratio. It's approximately (n-1)/n. You can live with the approximation, or precompute and store the ratios up to the max n you're considering.

Note that (n-1)/n converges very quickly.

Fast generation of random derangements

4 Answers4