2

After reading this question. I was wondering is it possible using O(1) space can we generate a random permutation of the sequence [1...n] with a uniform distribution using something like double hashing?

I tried this with a small example for the sequence [1,2,3,4,5] and it works. But it fails for scale for larger sets.

int h1(int k) {
    return 5 - (k % 7);
}

int h2(int k) {
    return (k % 3) + 1;
}

int hash(int k, int i) {
    return (h1(k) + i*h2(k)) % size;
}

int main() {
    for(int k = 0; k < 10; k++) {
        std::cout << "k=" << k <<  std::endl;
        for(int i = 0; i < 5; i++) {
            int q = hash(k, i);
            if(q < 0) q += 5;
            std::cout << q;
        }
        std::cout << std::endl;
    }
}
Community
  • 1
  • 1
andre
  • 7,018
  • 4
  • 43
  • 75
  • I'm not sure but why not use [lehmer code](http://en.wikipedia.org/wiki/Permutation#Numbering_permutations) or [factoradic](http://en.wikipedia.org/wiki/Factorial_number_system#Permutations) ? – Kwariz Sep 17 '12 at 15:58

3 Answers3

3

You can try another approach.

  1. Take arbitrary integer number P that GCD(P, N) == 1 where GCD(P, N) is greatest common divisor of P and N (e.g. GCD(70, 42) == 14, GCD(24, 35) == 1).
  2. Get sequence K[i] ::= (P * i) mod N + 1, i from 1 to N
  3. It's proven that sequence K[i] enumerates all numbers between 1 and N with no repeats (actually K[N + 1] == K[1] but that is not a problem because we need only first N numbers).

If you can efficiently generate such numbers P with uniform distribution (e.g. with a good random function) with using Euclidean algorithm to calculate GCD in O(log(N)) complexity you'll get what you want.

UnknownGosu
  • 854
  • 6
  • 9
  • If I choose P a prime number greater than N/2, the GCD is necessarily 1 (and for most primes and most N, it's still 1 anyway), so this will always work then? That sounds interesting. Do you have a reference (or the name, so I can Google it) for this proof? – Damon Dec 10 '13 at 11:11
  • I've been curious and did run some tests on about 3 dozen combinations of P and N as specified, and while it "looks" random at first glance and indeed traverses every value exactly once, it does not appear very random if one plots the data. Even the few values that "look random" in the plot have a very clear identifiable pattern such as "increment 4 times, fall back to low value". See: [animated GIF of LibreOffice Calc plot](http://imageshack.com/a/img199/618/6ny.gif) (only one value of N used in that example). – Damon Dec 10 '13 at 13:29
  • 1
    Basically, this is a Lehmer generator which is re-seeded with the index position, but without Lehmer's requirements on the modulus. Your special condition about the GCD seems to come from the Hull-Dobell theorem, but I can't see how this would guarantee that no number appears twice within the sequence (at best it guarantees a _period_ of length N, but even for this some conditions are missing). – Damon Dec 11 '13 at 16:23
1

It is not possible to generate a "random" permutation without some randomness. It doesn't even make sense. Your code will generate the same permutation every time.

I suspect you intend that you pick a different two random hash functions every time. But even that won't work using hash functions like you have (a +/- k%b for a,b chosen at random), as you need O(n log n) bits of randomness to specify a permutation.

Keith Randall
  • 22,985
  • 2
  • 35
  • 54
0

I'm not sure what the question is. If you want a random permutation, you want a random number generator, not a hash function. A hash function is (and must be) deterministic, so it cannot be used for a "random" permutation. And a hash is not a permutation of anything.

I don't think that a random permutation can be O(1) space. You've got to keep track somehow of the elements which have already been used.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Lets say we have a fixed size open addressed hash table that's full. If we were to try to insert one more item, any good hash function would have to check every location. So a list of all the location check should generate a permutation of index [0..table_size-1] – andre Sep 17 '12 at 16:10