1

I'm going straight to the point. I have this code:

while (inputLength > 0)
{
    if (mode == MODE_AES_ENCRYPT)
        aesni_ecb_encrypt(ctx, input + shift, 16, output + shift);
    else if (mode == MODE_AES_DECRYPT)
        aesni_ecb_decrypt(ctx, input + shift, 16, output + shift);

    shift += 16;
    inputLength -= 16;
}

It performs an AES-ECB encryption on one 16-bytes block in input and store the result in output. The parameter ctx is a structure that contains the number of round and the subkeys for the encryption.

AES-ECB encryption can theoretically be parallelized, so I tried multithreading the code like this:

typedef struct
{
    AES_Context* Ctx;

    unsigned char* input;
    unsigned char* output;

    _Bool done;
} threadInfos;

unsigned long WINAPI ThreadFunc(threadInfos* data)
{
    aes_ecb_encrypt(data->Ctx, data->input, data->output);

    data->done = 1;
}

while (inputLength > 0) 
{
    threadInfos info1; info1.done = 0; info1.Ctx = ctx;
    threadInfos info2; info2.done = 0; info2.Ctx = ctx;
    threadInfos info3; info3.done = 0; info3.Ctx = ctx;
    threadInfos info4; info4.done = 0; info4.Ctx = ctx;

    info1.input = (input + shift); info1.output = (output + shift);
    info2.input = (input + shift + 16); info2.output = (output + shift + 16);
    info3.input = (input + shift + 32); info3.output = (output + shift + 32);
    info4.input = (input + shift + 48); info4.output = (output + shift + 48);

    CreateThread(NULL, 0, ThreadFunc, &info1, 0, NULL);
    CreateThread(NULL, 0, ThreadFunc, &info2, 0, NULL);
    CreateThread(NULL, 0, ThreadFunc, &info3, 0, NULL);
    CreateThread(NULL, 0, ThreadFunc, &info4, 0, NULL);

    while (info1.done == 0 || info2.done == 0 || info3.done == 0 || info4.done == 0)
        ;

    shift += 64;
    inputLength -= 64;
}

and here are the results in terms of speed:

AES-ECB monothread vs. AES-ECB 4 threads

The output is the same, which means that my multithreading seems to be working, however, it is highly inefficient since it is 1000x slower...

And here is my question. How could I multithread the encryption on 4 or 8 threads - depending on the CPU capabilities - but in such a way that it is faster and not 1000x slower ?

Tom Clabault
  • 481
  • 4
  • 18
  • 1
    Why are you using a loop to wait? You could've used `WaitForSingleObject` to join threads. – Azeem Jun 12 '18 at 09:00
  • Threading have not to be faster in any case. – user743414 Jun 12 '18 at 09:02
  • If I use WaitForSingleObject, I have to create a mutex right ? How can I make one single mutex depends on 4 threads ? – Tom Clabault Jun 12 '18 at 09:17
  • Either test and reset a manual reset event if not all are done, or use a semaphore. – Pete Kirkham Jun 12 '18 at 09:19
  • How do I use a semaphore ? I used Google but I don't understand how it works. I saw this example: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686946(v=vs.85).aspx but it uses WaitForMultipleObjects. Why don't I use that function instead of a semaphore to wait for all the threads to terminate ? – Tom Clabault Jun 12 '18 at 10:01

1 Answers1

2

The problem is that you are creating a thread to do one block of the AES algorithm and then destroy it again. As you noticed that is 1000x slower. All your time is spend creating and destroying threads.

What you need to do is create the threads once at the start and then have them each work a part of all blocks. For example have thread 0 do all blocks with block % 4 == 0, thread 1 do all blocks with block % 4 == 1 and so on.

Note: _Bool done; is not thread safe. On e.g. ARM your wait loop might never ever complete.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • What do you think I could do to replace the while loop ? Pete Kirkham talked about semaphore but I don't know how to use them. Isn't it more simple to use `WaitForMultipleObjects` ? – Tom Clabault Jun 12 '18 at 11:13
  • You don't want to eliminate the while loop. You have to move the while loop into the threads. And then you can just join the threads. The join will block till each thread is done. – Goswin von Brederlow Jun 12 '18 at 12:55