AES-NI seems to be optimized to encrypt/decrypt big chunks of data. However I'm trying to decrypt a password and I have many very small bits to try (iv + first cbc block, 32 bytes in total).
I'm using openssl at the moment, calling EVP_DecryptInit_ex
, EVP_DecryptUpdate
for every cycle (and EVP_CIPHER_CTX_init
once per thread).
I can do this around 2 million times per second on a single core.
I assume this is the sort of performance I can expect using AES-NI instructions and I shouldn't worry about optimising this further. Is this correct?
Does anyone have any idea how much faster this might be on a high end GPU or not-too-expensive FPGA?