Here's a fairly simple implementation in Python 3 of Format-preserving encryption using a balanced Feistel network that I wrote almost 2 years ago. It can perform the kind of index permutation that you want for N up to 264 on a 32 bit system, or 2128 on a 64 bit build of Python. That's due to the size of the integer returned by the hash()
function. See sys.hash_info
to find the limits for your system. It wouldn't be hard to use a fancier hash function that can return values with greater bit length, but I didn't want to make this code more complicated or slower.
Update
I've made a few minor improvements to the previous version, and I've added some more information in the comments. Instead of using the low bits returned from the hash function, we use the high bits, which generally improves the randomness, especially for short bit lengths. I've also added another hash function, xxhash by Yann Collet, which works works much better than Python's hash
for this application, especially for shorter bit lengths, although it is a little slower. The xxhash algorithm has a much higher avalanche effect than the built-in hash
, so the resulting permutations tend to be more well-shuffled.
Although this code works for small values of stop
it's more suited to handle stop >= 2**16
. If you need to permute smaller ranges it's probably a Good Idea to just use random.shuffle
on list(range(stop))
. It will be faster, and it doesn't use that much RAM: list(range(2**16))
consumes about 1280 kilobytes on a 32 bit machine.
You will notice that I use a string to seed the random number generator. For this application we want the randomizer to have plenty of entropy, and using a big string (or bytes
) is an easy way to do that, as the random
module docs mention. Even so, this program can only produce a tiny fraction of all the possible permutations when stop
is large. For stop == 35
there are 35! (35 factorial) different permutations, and 35! > 2132, but the total bit length of our keys is only 128, so they can't cover all those permutations. We could increase the number of Feistel rounds to get a little more coverage, but obviously that would be impractical for large values of stop
.
''' Format preserving encryption using a Feistel network
This code is *not* suitable for cryptographic use.
See https://en.wikipedia.org/wiki/Format-preserving_encryption
https://en.wikipedia.org/wiki/Feistel_cipher
http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords
A Feistel network performs an invertible transformation on its input,
so each input number produces a unique output number. The netword operates
on numbers of a fixed bit width, which must be even, i.e., the numbers
a particular network operates on are in the range(4**k), and it outputs a
permutation of that range.
To permute a range of general size we use cycle walking. We set the
network size to the next higher power of 4, and when we produce a number
higher than the desired range we simply feed it back into the network,
looping until we get a number that is in range.
The worst case is when stop is of the form 4**k + 1, where we need 4
steps on average to reach a valid n. In the typical case, where stop is
roughly halfway between 2 powers of 4, we need 2 steps on average.
Written by PM 2Ring 2016.08.22
'''
from random import Random
# xxhash by Yann Collet. Specialised for a 32 bit number
# See http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html
def xxhash_num(n, seed):
n = (374761397 + seed + n * 3266489917) & 0xffffffff
n = ((n << 17 | n >> 15) * 668265263) & 0xffffffff
n ^= n >> 15
n = (n * 2246822519) & 0xffffffff
n ^= n >> 13
n = (n * 3266489917) & 0xffffffff
return n ^ (n >> 16)
class FormatPreserving:
""" Invertible permutation of integers in range(stop), 0 < stop <= 2**64
using a simple Feistel network. NOT suitable for cryptographic purposes.
"""
def __init__(self, stop, keystring):
if not 0 < stop <= 1 << 64:
raise ValueError('stop must be <=', 1 << 64)
# The highest number in the range
self.maxn = stop - 1
# Get the number of bits in each part by rounding
# the bit length up to the nearest even number
self.shiftbits = -(-self.maxn.bit_length() // 2)
self.lowmask = (1 << self.shiftbits) - 1
self.lowshift = 32 - self.shiftbits
# Make 4 32 bit round keys from the keystring.
# Create an independent random stream so we
# don't intefere with the default stream.
stream = Random()
stream.seed(keystring)
self.keys = [stream.getrandbits(32) for _ in range(4)]
self.ikeys = self.keys[::-1]
def feistel(self, n, keys):
# Split the bits of n into 2 parts & perform the Feistel
# transformation on them.
left, right = n >> self.shiftbits, n & self.lowmask
for key in keys:
left, right = right, left ^ (xxhash_num(right, key) >> self.lowshift)
#left, right = right, left ^ (hash((right, key)) >> self.lowshift)
return (right << self.shiftbits) | left
def fpe(self, n, reverse=False):
keys = self.ikeys if reverse else self.keys
while True:
# Cycle walk, if necessary, to ensure n is in range.
n = self.feistel(n, keys)
if n <= self.maxn:
return n
def test():
print('Shuffling a small number')
maxn = 10
fpe = FormatPreserving(maxn, 'secret key string')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print(i, a, b)
print('\nShuffling a small number, with a slightly different keystring')
fpe = FormatPreserving(maxn, 'secret key string.')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print(i, a, b)
print('\nHere are a few values for a large maxn')
maxn = 10000000000000000000
print('maxn =', maxn)
fpe = FormatPreserving(maxn, 'secret key string')
for i in range(10):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print('{}: {:19} {}'.format(i, a, b))
print('\nUsing a set to test that there are no collisions...')
maxn = 100000
print('maxn', maxn)
fpe = FormatPreserving(maxn, 'secret key string')
a = {fpe.fpe(i) for i in range(maxn)}
print(len(a) == maxn)
print('\nTesting that the operation is bijective...')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
assert b == i, (i, a, b)
print('yes')
if __name__ == "__main__":
test()
output
Shuffling a small number
0 4 0
1 2 1
2 5 2
3 9 3
4 1 4
5 3 5
6 7 6
7 0 7
8 6 8
9 8 9
Shuffling a small number, with a slightly different keystring
0 9 0
1 8 1
2 3 2
3 5 3
4 2 4
5 6 5
6 1 6
7 4 7
8 7 8
9 0 9
Here are a few values for a large maxn
maxn = 10000000000000000000
0: 7071024217413923554 0
1: 5613634032642823321 1
2: 8934202816202119857 2
3: 296042520195445535 3
4: 5965959309128333970 4
5: 8417353297972226870 5
6: 7501923606289578535 6
7: 1722818114853762596 7
8: 890028846269590060 8
9: 8787953496283620029 9
Using a set to test that there are no collisions...
maxn 100000
True
Testing that the operation is bijective...
yes
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8
Here's how to use it to make a generator:
def ipermute(stop, keystring):
fpe = FormatPreserving(stop, keystring)
for i in range(stop):
yield fpe.fpe(i)
for i, v in enumerate(ipermute(10, 'secret key string')):
print(i, v)
output
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8
It's reasonably fast (for Python), but it's definitely not suitable for cryptography. It could be made crypto-grade by increasing the number of Feistel rounds to at least 5 and by using a suitable cryptographic hash function, eg Blake2. Also, a cryptographic method would need to be used to produce the Feistel keys. Of course, one shouldn't write cryptographic software, unless you know exactly what you're doing, since it's just too easy to write code that's vulnerable to timing attacks, etc.