Randomly distributed bijective mapping generator function for string generation

Question

I'm trying to create a python generator function to yield bijective (Injective & Surjective) mapping between sequential numbers from 1 to 38⁵ and randomly distributed products of alpha-numerics (28 lowercase letters + 10 numbers) with length 5
for example something like this:

1 -> 4fde6
2 -> grt74
3 -> g7w33
...

is there a module for this?
Any idea for the algorithm?

Edit: I want the mapping be:

Implementable using generators (not memory intensive)
Constant during different runs of code
Uniformly distributed as much as possible

So in one sentence i want an Uniformly distributed constant bijective mapping between indexes and products of n alpha-numerics

Thanks

What alphabet can the output use (i.e. numerals, lower and upper case letters)? You need it to have at least 38 characters in order to represent all numbers from 1 to 38^5. — Dominic Price, Jun 27 '18 at 21:05
Really, this boils down to ["I want to shuffle an iterator"](https://stackoverflow.com/questions/21187131/how-to-use-random-shuffle-on-a-generator-python). Note that you can get the "key" part of your mapping by just doing `enumerate(shuffled_product, start=1)`. From there, it's just a matter of reading `product(ascii_lowercase + digits + "AB", repeat=5)` into a list and then shuffling it. Obviously, this will take a lot of memory, but it's doable. — Patrick Haugh, Jun 27 '18 at 21:20
If you would want bijective mapping int index->five bytes (2^40) that could be done easily — Severin Pappadeux, Jun 27 '18 at 21:28
@DominicPrice: numerals & lowercase letters (28 + 10=38 chars) — Ariyan, Jun 27 '18 at 21:51
@PatrickHaugh: Unfortunately, Shuffle is not good here; first of all shuffling needs list conversion and is not generator friendly, secondly shuffling returns different random sequence but I want the mapping function be able to always map same input to same output in multiple instances of application — Ariyan, Jun 27 '18 at 21:59
@SeverinPappadeux: yes, That is exactly what is want, furthermore I want this be implementable using a generator (memory reasons) and the mapping be static and uniform as much as possible — Ariyan, Jun 27 '18 at 22:05
Well, for mapping `int [0...2^40)<->5bytes random number` there is a simple solution - Linear Congruential Generator. I could write an answer. But problem is, it is not permuted letters on the right side, but random 40 bits — Severin Pappadeux, Jun 27 '18 at 22:53
@SeverinPappadeux I will appreciate if you write the answer for this using LCG, thanks — Ariyan, Oct 07 '18 at 09:29
@RYN Just came out from the trip and going to another one - will take a jab at it this weekend — Severin Pappadeux, Oct 09 '18 at 21:57

Severin Pappadeux · Accepted Answer · 2018-10-16T23:38:26.997

Ok, here is bijective mapping from index to 5bytes string and back, based on Linear Congruential Generator. I used constants for drand48, seems to work fine for 40bits LCG, with seed 1 I tested all 2⁴⁰ values and got full period.

It relies on unsigned 64bit math with masked overflow, thus heavy use of NumPy. Packing and unpacking done in somewhat crude way, lots of things to improve. Don't hesitate to ask questions.

import numpy as np

# unsigned constants
ZERO  = np.uint64(0)
ONE   = np.uint64(1)

# signed constants
SZERO = np.int64(0)
SONE  = np.int64(1)
MONE  = np.int64(-1)

# LCG parameters
bits = np.uint64(40)
mult = np.uint64(25214903917)
incr = np.uint64(11)

mod  = np.uint64( np.left_shift(ONE, bits) )
mask = np.uint64(mod - ONE)

def rang(seed: np.uint64) -> np.uint64:
    """
    LCG mapping from one 40bit integer to another
    """
    return np.uint64(np.bitwise_and(mult*np.uint64(seed) + incr, mask))

def compute_nskip(nskp: np.int64) -> np.int64:
    nskip: np.int64 = nskp

    while nskip < SZERO:
        t: np.uint64 = np.uint64(nskip) + mod
        nskip = np.int64(t)

    return np.int64( np.bitwise_and(np.uint64(nskip), mask) )

def skip(nskp: np.int64, seed: np.uint64) -> np.uint64: # inverse mapping
    """
    Jump from given seed by number of skips
    """
    nskip: np.int64 = compute_nskip(nskp)

    m: np.uint64 = mult # original multiplicative constant
    c: np.uint64 = incr # original additive constant

    m_next: np.uint64 = ONE  #  new effective multiplicative constant
    c_next: np.uint64 = ZERO # new effective additive constant

    while nskip > SZERO:
        if np.bitwise_and(nskip, SONE) != SZERO: # check least significant bit for being 1
            m_next = np.bitwise_and(m_next * m, mask)
            c_next = np.bitwise_and(c_next * m + c, mask)

        c = np.bitwise_and((m + ONE) * c, mask)
        m = np.bitwise_and(m * m, mask)

        nskip = np.right_shift(nskip, SONE) # shift right, dropping least significant bit

    # with G and C, we can now find the new seed
    return np.bitwise_and(m_next * seed + c_next, mask)

def index2bytes(i: np.uint64) -> bytes:
    bbb: np.uint64 = rang(i)

    rc = bytearray()
    rc.append( np.uint8( np.bitwise_and(bbb, np.uint64(0xFF)) ) )
    bbb = np.right_shift(bbb, np.uint64(8))
    rc.append( np.uint8( np.bitwise_and(bbb, np.uint64(0xFF)) ) )
    bbb = np.right_shift(bbb, np.uint64(8))
    rc.append( np.uint8( np.bitwise_and(bbb, np.uint64(0xFF)) ) )
    bbb = np.right_shift(bbb, np.uint64(8))
    rc.append( np.uint8( np.bitwise_and(bbb, np.uint64(0xFF)) ) )
    bbb = np.right_shift(bbb, np.uint64(8))
    rc.append( np.uint8( np.bitwise_and(bbb, np.uint64(0xFF)) ) )

    return rc

def bytes2index(a: bytes) -> np.uint64:
    seed: np.uint64 = ZERO
    seed += np.left_shift( np.uint64(a[0]), np.uint64(0))
    seed += np.left_shift( np.uint64(a[1]), np.uint64(8))
    seed += np.left_shift( np.uint64(a[2]), np.uint64(16))
    seed += np.left_shift( np.uint64(a[3]), np.uint64(24))
    seed += np.left_shift( np.uint64(a[4]), np.uint64(32))

    return skip(MONE, seed)

# main part, silence overflow warnings first
np.warnings.filterwarnings('ignore')

bbb = index2bytes(ONE)
print(bbb)
idx = bytes2index(bbb)
print(idx)

bbb = index2bytes(999999)
print(bbb)
idx = bytes2index(bbb)
print(idx)

bbb = b'\xa4\x3c\xb1\xfc\x79'
idx = bytes2index(bbb)
print(idx)
bbb = index2bytes(idx)
print(bbb)
print(bbb == b'\xa4\x3c\xb1\xfc\x79')

Randomly distributed bijective mapping generator function for string generation

1 Answers1