I'm going straight to the point. I have this code:
while (inputLength > 0)
{
if (mode == MODE_AES_ENCRYPT)
aesni_ecb_encrypt(ctx, input + shift, 16, output + shift);
else if (mode == MODE_AES_DECRYPT)
aesni_ecb_decrypt(ctx, input + shift, 16, output + shift);
shift += 16;
inputLength -= 16;
}
It performs an AES-ECB encryption on one 16-bytes block in input and store the result in output. The parameter ctx is a structure that contains the number of round and the subkeys for the encryption.
AES-ECB encryption can theoretically be parallelized, so I tried multithreading the code like this:
typedef struct
{
AES_Context* Ctx;
unsigned char* input;
unsigned char* output;
_Bool done;
} threadInfos;
unsigned long WINAPI ThreadFunc(threadInfos* data)
{
aes_ecb_encrypt(data->Ctx, data->input, data->output);
data->done = 1;
}
while (inputLength > 0)
{
threadInfos info1; info1.done = 0; info1.Ctx = ctx;
threadInfos info2; info2.done = 0; info2.Ctx = ctx;
threadInfos info3; info3.done = 0; info3.Ctx = ctx;
threadInfos info4; info4.done = 0; info4.Ctx = ctx;
info1.input = (input + shift); info1.output = (output + shift);
info2.input = (input + shift + 16); info2.output = (output + shift + 16);
info3.input = (input + shift + 32); info3.output = (output + shift + 32);
info4.input = (input + shift + 48); info4.output = (output + shift + 48);
CreateThread(NULL, 0, ThreadFunc, &info1, 0, NULL);
CreateThread(NULL, 0, ThreadFunc, &info2, 0, NULL);
CreateThread(NULL, 0, ThreadFunc, &info3, 0, NULL);
CreateThread(NULL, 0, ThreadFunc, &info4, 0, NULL);
while (info1.done == 0 || info2.done == 0 || info3.done == 0 || info4.done == 0)
;
shift += 64;
inputLength -= 64;
}
and here are the results in terms of speed:
The output is the same, which means that my multithreading seems to be working, however, it is highly inefficient since it is 1000x slower...
And here is my question. How could I multithread the encryption on 4 or 8 threads - depending on the CPU capabilities - but in such a way that it is faster and not 1000x slower ?