I've seen something odd regarding OpenSSL performance.
This is the output of 'openssl speed aes-128-cbc' on a physical HP Bl460c Gen8 with dual E5-2680's running RHEL/OEL 6.4x64 and OpenSSL 1.0.0-fips;
Doing aes-128 cbc for 3s on 16 size blocks: 19853475 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 5366868 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1364167 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 343297 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 43002 aes-128 cbc's in 3.00s
I installed OpenSSL 1.0.1f on the same blade and retested, getting these results;
Doing aes-128 cbc for 3s on 16 size blocks: 19887908 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 5367604 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1365296 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 343261 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 42996 aes-128 cbc's in 2.99s
They're broadly similar.
But then for reference I ran the same test on an appliance VM (4 x vCPU, 8GB, ESXi 5.5 on a identical blade to above) running SuSE 11 and OpenSSL 0.9.8-fips and got the following result;
Doing aes-128 cbc for 3s on 16 size blocks: 31056333 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 10296043 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 2772200 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 1024 size blocks: 712440 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 89701 aes-128 cbc's in 2.99s
More than double the performance in most cases!
Has anyone any idea what's going on here please - I've read a whole bunch of OpenSSL documents and Intel's OpenSSL documents regarding their hardware AES-NI components but I'm confused by this.