Python Encryption - Reproducing Error: Data must be padded to 16 byte boundary in CBC mode

Question

I'm having trouble reproducing the following error;

Data must be padded to 16 byte boundary in CBC mode

When I encrypt my data I use the following padding;

BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS)

and the encryption is done with the following code;

encrypt = base64.b64encode(iv + cipher.encrypt(pad(raw).encode('utf8'))).decode()

I have had no issues with the padding or encryption for a good period of time now, but I received this error recently (only once), but I'm not sure how to reproduce it. Can the actual data being passed be the issue? What would trigger this error with 16 byte padding in place?

What are you trying to do here `chr(BS - len(s)`? If the len of s is greater than 16, the result is a negative number. — John Hanley, Jun 14 '23 at 01:03
You don't pad the characters anyway, you pad the bytes. In other words, you pad **after** you encode. — President James K. Polk, Jun 14 '23 at 02:47

President James K. Polk · Accepted Answer · 2023-06-17T16:53:18.957

Padding must be performed on the bytes to be encrypted, not the characters. With utf-8 encoding, some characters are encoded to multiple bytes. For example, consider the two strings

s1 = chr(0x30)
s2 = chr(0x80)

Both strings have length 1, but the length of s1.encode('utf-8') will be 1 while the length of s2.encode('utf-8') will be 2. Your algorithm will pad s2 incorrectly.

Here is a modified pad function that is not so cryptic. You can turn it into a one-liner if you want.

def pad(s: bytes):
    block_size = 16
    size_of_last_block = len(s) % block_size
    padding_amount = block_size - size_of_last_block
    pad_bytes = bytes([padding_amount] * padding_amount)
    return s + pad_bytes

Note, however, that PyCryptodome already includes pad() and unpad() functions that should normally be used in preference to something home-grown. Example:

import base64
import secrets

from Cryptodome.Cipher import AES
from Cryptodome.Util.Padding import pad, unpad


def example():
    key = secrets.token_bytes(32)
    cipher = AES.new(key, AES.MODE_CBC)  # pycryptodome will generate the random IV
    pt = 'Hello World, the secret to success is: Python'.encode('utf-8')
    padded_pt = pad(pt, cipher.block_size, style='pkcs7')
    ct = cipher.encrypt(padded_pt)
    result = base64.b64encode(cipher.iv + ct).decode('utf-8')
    return result, key

def decrypt(key: bytes, encrypted_blob_b64: str):
    encrypted_blob = base64.b64decode(encrypted_blob_b64)
    iv, ct = encrypted_blob[:AES.block_size], encrypted_blob[AES.block_size:]
    cipher = AES.new(key, mode=AES.MODE_CBC, iv=iv)
    padded_pt = cipher.decrypt(ct)
    pt = unpad(padded_pt, cipher.block_size, style='pkcs7')
    return pt.decode('utf8')

result, key = example()
print(f'len = {len(result)}, result = {result}')
print(decrypt(key, result))

thanks for the explanation. I was able to reproduce the error using characters that encode to multiple bytes, such as €. I took your advice and padded after encoding, but I got a type error; ```TypeError: can't concat str to bytes``` in my pad; ```pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS)``` . Can you shed some light on why this is happening? — Kakedis, Jun 16 '23 at 05:42

Python Encryption - Reproducing Error: Data must be padded to 16 byte boundary in CBC mode

1 Answers1