3

I need a stable and fast one way mapping function of an integer to a random number. By "stable" I mean that the same integer should always map to the same random number. And by "random number" I actually mean "some number which behaves like random".

e.g.

1 -> 329423
2 -> -12398791234
3 -> -984
4 -> 42342435
...

If I had enough memory (and time) I would ideally use:

for( int i=Integer.MIN_VALUE; i<Integer.MAX_VALUE; i++ ){
    map[i]=i;
}
shuffle( map );

I could use some secure hash function like MD5 or SHA but these are to slow for my purposes and I don't need any crypto/security properties.

I only need this in one way. So I will never have to translate the random number back to its integer.

Background: (For those who want to know more)

I'm planing to use this to invalidate a complete cache over a given amount of time. The invalidation is done "randomly" on access of the cache member with an increasing chance while time passes. I need this to be stable so that isValid( entry ) does not "flicker" and for consistent testing. The input to this function will be the java hash of the key of the entry which typically is in the range of "1000"-"15000" (but can contain some other stuff, too) and comes in bulks. The invalidation is done on the condition of:

elapsedTime / timeout * Integer.MAX_VALUE > abs( random( key.hashCode() ) )

EDIT: (this is to long for a comment so I put it here)

I tried gexicide's answer and it turns out this isn't random enough. Here is what I tried:

        for( int i=0; i<12000; i++ ){

            int hash = (""+i).hashCode();

            Random rng = new Random( hash );
            int random = rng.nextInt();

            System.out.printf( "%05d, %08x, %08x\n", i, hash, random );

        }

The output starts with:

00000, 00000030, bac2c591
00001, 00000031, babce6a4
00002, 00000032, bace836b
00003, 00000033, bac8a47e
00004, 00000034, baab49de
00005, 00000035, baa56af1
00006, 00000036, bab707b7
00007, 00000037, bab128ca
00008, 00000038, ba93ce2a
00009, 00000039, ba8def3d
00010, 0000061f, 98048199

and it goes on in this way.

I could use SecureRandom instead:

    for( int i=0; i<12000; i++ ){

            SecureRandom rng = new SecureRandom( (""+i).getBytes() );
            int random = rng.nextInt();

            System.out.printf( "%05d, %08x\n", i, random );
        }

which indeed looks pretty random but this is not stable anymore and 10 times slower than the method above.

Scheintod
  • 7,953
  • 9
  • 42
  • 61
  • You can use shuffle with a `Random` with a fixed seed. This will give you the same random order each time. – Peter Lawrey Jul 10 '14 at 10:01
  • 1
    Just initialize a PRNG with a fixed seed and use the *nth* random number it delivers, or possibly, if it delivers unique values, @gexicide's solution below. But I question the application. It's not hard to keep an idea of the least recently or least frequently used cache entry, and remove that each iteration. – user207421 Jul 10 '14 at 10:01
  • You'll need a PRNG (pseudo random number generator) implementation and start it with the same seed every time, and build your sequence from that. – Anders R. Bystrup Jul 10 '14 at 10:01
  • 1
    Do the random numbers need to be unique in the mapping? (e.g. is it a random mapping of all integer values to all integer values?) – GrahamS Jul 10 '14 at 10:02
  • 1
    No. It doesn't need to be unique. (But it wouldn't be bad if it was.) – Scheintod Jul 10 '14 at 10:16
  • In that case I'd say @gexicide has the right answer. – GrahamS Jul 10 '14 at 10:18
  • @EJP: You're right. And I'm using your idea for entries which have a fixed lifespan on creation or which are put in a fixed size cache and needs invalidation because it's full. In my case here I have to invalidate the complete cache and I want to distribute this roughly over a given time. – Scheintod Jul 10 '14 at 12:03
  • Use any invertible (==> automagically unique!) function until you find one which is fast/random enough; eg you could use a xorshift, or the "twist" of the mersenne twister (for the lultz, you could use a 32-bit block cipher, eg RC5 with w=16, or NSA's SPECK32) – loreb Jul 12 '14 at 16:55

3 Answers3

7

Although you never specified it as a requirement you'll probably want a full 1:1 mapping. This is because the number of possible input values is small. Any output that can occur for more than one input implies another output which can never happen at all. If you have output values which are impossible then you have a skewed distribution.

Of course, if your input is skewed then your output will be skewed anyway, and there's not much you can do about that.

Anyway; this makes it a unique int to int hash.

Simply apply a couple of trivial, independent 1:1 mapping functions until things are suitably distributed. You've already isolated one transform from the Random class, but I suggest mixing it with some other transforms like shifts and XORs to avoid individual weaknesses of different algorithms.

For example:

public static int mapInteger( int value ){

    value *= 1664525;
    value += 1013904223;
    value ^= value >>> 12;
    value ^= value << 25;
    value ^= value >>> 27;
    value *= 1103515245;
    value += 12345;

    return value;
}

If that's good enough then you can make it faster by deleting lines at random (I suggest you keep at least one multiply) until it's not good enough anymore, and then add the last deleted line back in.

Community
  • 1
  • 1
sh1
  • 4,324
  • 17
  • 30
  • Thanks for your ideas. As I understand it (and I don't understand very much of modulo math) to be a 1:1 mapping, every single operation needs to be reversible. I can see how `+=` is reversible. But how are `*=` and `^=>>` ? – Scheintod Jul 16 '14 at 09:06
  • 2
    `*=` is reversible whenever the multiplicand is odd, because the range of the int is a power of two and all odd numbers are co-prime with all powers of two. I'm afraid I can't think of an easy way to prove that, right now, and I don't know how easy it is to actually reverse (I only know that every result is unique). `^=>>` is reversible because `^` is reversible on a [bit-by-bit](http://stackoverflow.com/a/16748786/2417578) basis. – sh1 Jul 19 '14 at 04:30
  • hey thanks! For the multiplication I've already found this: https://math.stackexchange.com/questions/684550/how-to-reverse-modulo-of-a-multiplication which mentions the Euclidean algorithm. – Scheintod Jul 19 '14 at 08:18
  • But one other thing: Shouldn't it be `value ^= value >>> 12` etc.? According to your explanation I *think* `>>` is reversible, too, but I think the intent is to fill the left side with zeros? – Scheintod Jul 19 '14 at 08:21
  • 1
    Sorry, yes, you're quite right. `>>` is actually not reversible, for signed types, and you should use `>>>`. I copied the example from some unsigned C code. – sh1 Jul 20 '14 at 17:57
3

Use a Random and seed it with your number:

Random generator = new Random(i);
return generator.nextInt();

As your testing exposes, the problem with this method is that such a seed creates a very poor random number in the first iteration. To increase the quality of the result, we need to run the random generator a few times; this will fill up the state of the random generator with pseudo-random values and will increase the quality of the following values.

To make sure that the random generator spreads the values enough, use it a few times before outputting the number. This should make the resulting number more pseudo-random:

Random generator = new Random(i);
for(int i = 0; i < 5; i++) generator.nextInt();
return generator.nextInt();

Try different values, maybe 5 is enough.

gexicide
  • 38,535
  • 21
  • 92
  • 152
  • A few seconds faster. :) +1 – Keppil Jul 10 '14 at 10:02
  • @EJP: No, they are not necessarily unique. But that was not OP's requirement. – gexicide Jul 10 '14 at 10:14
  • Thanks for the answer, but this is not working. See my edit above. – Scheintod Jul 10 '14 at 10:38
  • 1
    @Scheintod: Try my edit, i.e., try calling `nextInt()` a few times to "warm up" the created generator. – gexicide Jul 10 '14 at 11:01
  • Tested for a few possible number of iterations. Turns out that one additional shuffle/nextInt is all it takes for my purposes. And it's still pretty fast. – Scheintod Jul 10 '14 at 11:54
  • Posted some significant speed improvements in another answer. But I leave this as the correct one. – Scheintod Jul 10 '14 at 14:05
  • Although not technically incorrect, this solution seems like a combination of multiple bad practices that only produces a desirable result accidentally. First, your distribution will be skewed in unpredictable ways because random numbers from different seeds are not as random in relation to each other as random numbers generated from the same seed, forcing you to "warm up" the generator to sweep the not-very-random results under the rug. But the beginning of the sequence is hardly the only "ugly" part of the distribution. Try plotting the results to a texture and be horrified. – kiwibonga Sep 22 '20 at 22:36
3

The answer of gexicide is the correct (and the most simple) one. Just one note:

Running this 1,000,000 times on my system takes about 70ms. (Which is pretty fast.) But it involves at least two object creations and feeds the GC. It would be better if this could be done on the stack and not using object creation at all.

Looking at the sources of Random class it shows that there is some code to make it callable multiple times and to make it threadsafe which can be removed.

So I ended up with a reimplementation in one method:

public static int mapInteger( int value ){

    // initial scramble
    long seed = (value ^ multiplier) & mask;

    // shuffle three times. This is like calling rng.nextInt() 3 times
    seed = (seed * multiplier + addend) & mask;
    seed = (seed * multiplier + addend) & mask;
    seed = (seed * multiplier + addend) & mask;

    // fit size
    return (int)(seed >>> 16);
}

(multiplier, addend and mask are some constants used by Random)

Running this 1,000,000 times gives the same result but takes only 5ms and is therefor 10 times faster.

BTW: This happens to be another piece of code from The Old Man - again. See Donald Knuth, The Art of Computer Programming, Volume 2, Section 3.2.1

Scheintod
  • 7,953
  • 9
  • 42
  • 61