It's a bit odd that the encryption package works with Unicode strings. Better to encrypt byte data since it may not be only text that is encrypted/decrypted. Also no need for working with actual binary strings (Unicode 1s and 0s). Flip bits in the byte strings.
Below I've wrapped the encode/decode routines so they take either Unicode text and return byte strings or vice versa. There is also a corrupt
function that will flip bits in the encoded result to see the error correction in action:
import unireedsolomon as rs
import random
def corrupt(encoded):
'''Flip up to 3 bits (might pick the same bit more than once).
'''
b = bytearray(encoded) # convert to writable bytes
for _ in range(3):
index = random.randrange(len(b)) # pick random byte
bit = random.randrange(8) # pic random bit
b[index] ^= 1 << bit # flip it
return bytes(b) # back to read-only bytes, but not necessary
def encode(coder,msg):
'''Convert the msg to UTF-8-encoded bytes and encode with "coder". Return as bytes.
'''
return coder.encode(msg.encode('utf8')).encode('latin1')
def decode(coder,encoded):
'''Decode the encoded message with "coder", convert result to bytes and decode UTF-8.
'''
return coder.decode(encoded)[0].encode('latin1').decode('utf8')
coder = rs.RSCoder(20,13)
msg = 'hello(你好)' # 9 Unicode characters, but 13 (maximum) bytes when encoded to UTF-8.
encoded = encode(coder,msg)
print(encoded)
corrupted = corrupt(encoded)
print(corrupted)
decoded = decode(coder,corrupted)
print(decoded)
Output. Note that the first l
in hello
(ASCII 0x6C) corrupted to 0xEC, then second l
changed to an h
(ASCII 0x68) and another byte changed from 0xE5 to 0xF5. You can actually randomly change any 3 bytes (not just bits) including error-correcting bytes and the message will still decode.
b'hello(\xe4\xbd\xa0\xe5\xa5\xbd)8\xe6\xd3+\xd4\x19\xb8'
b'he\xecho(\xe4\xbd\xa0\xf5\xa5\xbd)8\xe6\xd3+\xd4\x19\xb8'
hello(你好)
A note about .encode('latin1')
: The encoder is using Unicode strings and the Unicode code points U+0000 to U+00FF. Because Latin-1 is the first 256 Unicode code points, the 'latin1'
codec will convert a Unicode string made up of those code points 1:1 to their byte values, resulting in a byte string with values ranging from 0-255.