Iterator to produce unique random order?

Question

The problem is stated as follows, we have a very large number of items which are traversed through an iterator pattern (which dynamicaly constructs or fetches) the requested item.

Due to the number of items being large and thus cannot be kept in memory (as a list for example).

What is a procedure for the iterator to follow in order to produce a random order of the items each time the iterator is called. A unique random order means that eventually all items are traversed only once but returned in a random order.

if the number of items is relatively small, one can solve this problem as follows:

Store the items in a list in memory (or secondary memory)
Shuffle the list
Traverse the shuffled list.

For this question one can assume that the iterator can index the items (or rank/unrank them). So the iterator can fetch the ith item for all indices i in the range of items.

Note the random order should be uniformly distributed in the set of all orderings of the items or in other words be unbiased. This condition leaves out solutions which randomise the list of items in a block-by-block scheme (in order to have some of the items in memory for example and randomise only them, then the next block of items and so on)

FWIW, [here](https://stackoverflow.com/a/51429458/4014959) is a Python 3 implementation of a Feistel network for this task that can handle up to `2**64` items. Two nice features of using a Feistel network (or other block cipher) is that the permuted indices can be accessed in any order, and they are invertible. — PM 2Ring, Jul 25 '18 at 15:09

score 4 · Accepted Answer · answered Mar 11 '15 at 16:48

4

Encryption is reversible, hence an encryption is a one-to-one mapping from a set onto itself.

Pick a block cypher with a large enough block size to cover the number of items you have.

Encrypt the numbers 0, 1, 2, 3, 4, ... This will give you a non-repeating ordered list of numbers up to 2^(block size).

If the encrypted number is too large, ignore it. If the encrypted number is within the size of your item list, then pick that item. Repeat for however many items you need.

A cypher with variable block-size (like the Hasty Pudding cypher) will reduce the number of misses.

answered Mar 11 '15 at 16:48

rossum

15,344
1
24
38

ok nice solution, how about efficiency or unbiased samplig property? – Nikos M. Mar 11 '15 at 16:50
1

Any good cypher should be unbiased, since it will be designed to produce random-seeming output. For efficiency you need to pick an appropriate block size (as small as possible) and a fast algorithm. Cryptographically secure algorithms are relatively slow. If you do not require crypto-level security, then you can write your own four- or six-round Feistel cypher, which will run faster but be much less secure. – rossum Mar 11 '15 at 16:56
thanks, check also [my answer analysis](http://stackoverflow.com/a/28996561/3591273) – Nikos M. Mar 11 '15 at 20:22
I've created a cypher that can be configured for any block size expressed in any number base: https://github.com/plcarmel/cryptic-sequences – Pierre-Luc Carmel Biron Mar 16 '21 at 15:33

score 0 · Answer 2 · edited May 23 '17 at 11:58

i will provide a plan to solve this problem as follows:

The whole point is about encoding the (successor) index in order to produce the random order.

As mentioned (reversible) encryption is such a scheme.

Lets see the general outline of such schemes:

Assume the iterator uses a successor function to get the next index/item. For the usual case the successor function would be:

def succ(index):
   return index+1

This returns the next index. If seen as a binary operation, the index+1 code creates a new index-pattern (for lack of better word) out of the current index pattern.

Presumably one can generalise this pattern (in binary operations, i.e logN) to create a succesor function that returns the next index but traversed in a shuffled or random unique order.

For example encryption routines can be seen as such a scheme.

An even simpler scheme would be to use modular arithmetic, similar to a pseudo-random number generator, with suitable parameters i.e:

def succ_rand_modulo(index):
   return (seed*index+step) mod numItems

For suitable choices of seed and step (e.g relatively prime and large enough) numbers so as to produce random orderings with the uniform property (i.e unbiased)

A linear mapping as above may not produce all orderings, one can assume polynomial mappings with more free parameters to adjust, that can produce all permutations/orderings (see: Permutation Polynomials (over finite fields))

Effectively all these methods are ways of re-encoding the indices (ie encryption, modulo arithmetic, pseudo-random-generation, binary manipulation and so on).

i would say that the most efficient way would be a (polynomial) modulo (i.e pseudo-random number generation) provided the parameters are suitably chosen to be able to produce a uniform/unbiased distribution over all random orderings (see above).

UPDATE:

Another way that needs only numItems bits to be stored in memory (but requires more time) is a rejection-based method. One can randomly generate or fetch one index/item and set the associated bit on the bit vector to 1 (item fetched), then produce another random index and if associated bit is set (index has been already used) reject it and re-produce another random index and so on. On average this will have a running time of O(numItems), but the worst case may be long.

UPDATE2:

Still another promising approach, and also optimal in various cases, is the method by I. STOJMENOVIC to rank/unrank combinarotial objects uniformly and efficiently without computing large integers.

The method works for this case as follows, a random ordering is just a permutation among all orderings. Thus to generate any random ordering, one tries to generate a random permutation of N! items. Being able to generate a random permutation uniformly (i.e unbiasedly) and without dealing with large integers is a plus. The method of STOJMENOVIC, uses only one random number in [0,1] and takes this as the probability that a particular permutation has been generated. To adapt the method in this case, one changes the code to return one element of the permutation at a time (keeping some state of where the generator is at any given time) using only an initial seed of a single random number in [0,1].

Lecture notes on Lexicographic generation (touching upon random generation as well).

References on same problem / similar approaches:

Unique random number generation based schemes

Cryptography/Cipher based schemes

Encoding-based schemes / Number-theoretic schemes

Iterator to produce unique random order?

2 Answers2

Linked