1

I have a C program which uses libgcrypt library to measure encryption and decryption speeds for AES256 CBC algorithm. The program works in this way: tries to decrypt/encrypt as much data as it could in 3 seconds. When done it calculates the overall size of processed bytes and translates it to speed in Mbytes/sec. For each decrypt/encrypt operation chunks of 256 bytes are used.

I have found that the decryption process can process much more data in the same amount of time compared to the encryption process. In fact, the decryption process can decrypt up to 3 times more data than the encryption process in the same amount of time.

Tried the same code on another machine - got the same results.

On my RPI device, however, speeds for both crypto operations are pretty much the same.

Here is a sample code that I use to test the performance of the encryption and decryption processes:

enum op {
    ENC = 0,
    DEC
};

void measure_crypto_performance(enum op op)
{
    #define BUFFER_SIZE 256
    #define TEST_TIME 3

    /* Initialize the library */
    gcry_check_version(NULL);

    /* Set the decryption algorithm */
    gcry_cipher_hd_t handle;
    gcry_cipher_open(&handle, GCRY_CIPHER_AES256, GCRY_CIPHER_MODE_CBC, 0);

    /* Set the decryption key */
    char key[32];
    for (int i = 0; i < 32; ++i)
        key[i] = i;
    gcry_cipher_setkey(handle, key, sizeof(key));

    /* Set the decryption initialization vector */
    char iv[16] = "0123456789abcdef";

    /* Set up the buffer for the ciphertext and plaintext */
    char plaintext[BUFFER_SIZE];
    char ciphertext[BUFFER_SIZE];
    char decrypted_data[BUFFER_SIZE];

    gcry_randomize((void *)plaintext, BUFFER_SIZE, GCRY_STRONG_RANDOM);

    if (op == DEC)
    {
        gcry_cipher_setiv(handle, iv, sizeof(iv));
        gcry_cipher_encrypt(handle, ciphertext, BUFFER_SIZE, plaintext, BUFFER_SIZE);
    }

    size_t counter = 0;
    time_t start_time = time(NULL);
    if (op == ENC)
    {
        while (time(NULL) - start_time < TEST_TIME) {
            gcry_cipher_setiv(handle, iv, sizeof(iv));
            gcry_cipher_encrypt(handle, ciphertext, BUFFER_SIZE, plaintext, BUFFER_SIZE);
            ++counter;
        }
    } else if (op == DEC)
    {
        while (time(NULL) - start_time < TEST_TIME) {
            gcry_cipher_setiv(handle, iv, sizeof(iv));
            gcry_cipher_decrypt(handle, decrypted_data, BUFFER_SIZE, ciphertext, BUFFER_SIZE);
            ++counter;
        }
    }

    /* Calculate performance */
    double elapsed_time = difftime(time(NULL), start_time);
    double speed = ((counter * BUFFER_SIZE) / 1000000) / (elapsed_time);

    /* Print results */
    printf("Op: %s\n", op == DEC ? "Decryption" : "Encryption");
    printf("Decrypted %lld bytes in %.2lf seconds\n", counter * BUFFER_SIZE, elapsed_time);
    printf("Speed: %.2lf Mbytes/sec\n", speed);

    /* Clean up */
    gcry_cipher_close(handle);
}

int main(int argc, char **argv)
{
    gpt_test(DEC);
    gpt_test(ENC);
    return 0;
}

My results:

Op: Decryption
Decrypted 9276416000 bytes in 3.00 seconds
Speed: 3092.00 Mbytes/sec
Op: Encryption
Encrypted 3099484416 bytes in 3.00 seconds
Speed: 1033.00 Mbytes/sec

I'm puzzled by this observation and wonder if there's something wrong with my code. I'd appreciate any insights on this.

user11729819
  • 107
  • 7
  • 1
    What optimizations did you use when compiling? What ISA/OS are you compiling for? Does your Pi hit an IO limit before a CPU limit? If so, that's your upper bound regardless of actual compute speed. – tadman Apr 30 '23 at 18:11
  • And 256 bytes per operation might not be optimal. Try larger sizes and you might get significantly faster performance. And as @tadman strongly implied, if you want to measure pure encryption/decryption speed, remove IO from the processing path. – Andrew Henle Apr 30 '23 at 18:25
  • 1
    I am compiling that program with these flags "-O2 -O3 -DAES_ASM -DBSAES_ASM". On RPI compile for arm(just have gcc compiler installed there). My laptop has x86 processor. My OS is Ubuntu. Actually, I am not pretty much concerned about RPI device. I just can`t understand if such x3 difference in speed for decrypt/encrypt operations is even possible. Looking for a bug somewhere is this code. – user11729819 Apr 30 '23 at 18:26
  • AES/CBC is a serial block encryption scheme - the output from block N is needed to compute the block N + 1 ciphertext. And on decryption, blocks can only be decrypted serially as well. Try something parallelizable like AES/CTR or AES/GCM. And make sure you compile with all optimizations as well, as most recent CPUs have actual hardware to do that. – Andrew Henle Apr 30 '23 at 18:29
  • Hi @AndrewHenle, not sure if I got you right. "remove IO from the processing path" - I am not doing any IO operations from disk- just operating on buffers located in RAM. – user11729819 Apr 30 '23 at 18:31
  • Maybe I need to clarify - I am concerned with x3 speed difference for decrypt/encrypt operations. For 3 seconds I try to encrypt/decrypt chunks of size 256 bytes. I assume the overall data processed should be pretty much the same for both type of operations. At least not x3 difference. I observe this phenomenon on my Ubuntu laptop. – user11729819 Apr 30 '23 at 18:35
  • You should pick one optimization level, not two. `O2` and `O3` are contradictory. – tadman Apr 30 '23 at 19:37
  • A) Do you have a performance *problem*? Is that speed adequate for your needs? B) What is the write speed of your drive? Are you hitting that limit when writing? reads can often be accelerated due to caching, but not always for writes. – tadman Apr 30 '23 at 19:38
  • @tadman: It seems that encrypt and decrypt should both read and write the same number of bytes. OP: with what I know about AES, I agree with your expectation that encryption and decryption should be roughly the same speed, FWIW. – 500 - Internal Server Error Apr 30 '23 at 21:17
  • @500-InternalServerError The best test would be to ensure that this is in memory to remove IO bounds from the benchmarking. – tadman Apr 30 '23 at 21:34
  • @tadman could you please point out which IO bounds you mean? – user11729819 May 01 '23 at 17:38
  • All the data that is processed(encrypted/decrypted) is located in RAM. F.e: char plaintext[BUFFER_SIZE]; is used to store plaintext of size 256 bytes. – user11729819 May 01 '23 at 17:39
  • Hi @500-InternalServerError, yeah, exactly. That is my main concern. – user11729819 May 01 '23 at 17:41

0 Answers0