Padding must be performed on the bytes to be encrypted, not the characters. With utf-8 encoding, some characters are encoded to multiple bytes. For example, consider the two strings
s1 = chr(0x30)
s2 = chr(0x80)
Both strings have length 1, but the length of s1.encode('utf-8')
will be 1 while the length of s2.encode('utf-8')
will be 2. Your algorithm will pad s2
incorrectly.
Here is a modified pad
function that is not so cryptic. You can turn it into a one-liner if you want.
def pad(s: bytes):
block_size = 16
size_of_last_block = len(s) % block_size
padding_amount = block_size - size_of_last_block
pad_bytes = bytes([padding_amount] * padding_amount)
return s + pad_bytes
Note, however, that PyCryptodome already includes pad()
and unpad()
functions that should normally be used in preference to something home-grown. Example:
import base64
import secrets
from Cryptodome.Cipher import AES
from Cryptodome.Util.Padding import pad, unpad
def example():
key = secrets.token_bytes(32)
cipher = AES.new(key, AES.MODE_CBC) # pycryptodome will generate the random IV
pt = 'Hello World, the secret to success is: Python'.encode('utf-8')
padded_pt = pad(pt, cipher.block_size, style='pkcs7')
ct = cipher.encrypt(padded_pt)
result = base64.b64encode(cipher.iv + ct).decode('utf-8')
return result, key
def decrypt(key: bytes, encrypted_blob_b64: str):
encrypted_blob = base64.b64decode(encrypted_blob_b64)
iv, ct = encrypted_blob[:AES.block_size], encrypted_blob[AES.block_size:]
cipher = AES.new(key, mode=AES.MODE_CBC, iv=iv)
padded_pt = cipher.decrypt(ct)
pt = unpad(padded_pt, cipher.block_size, style='pkcs7')
return pt.decode('utf8')
result, key = example()
print(f'len = {len(result)}, result = {result}')
print(decrypt(key, result))