Is it possible to do bitwise operation on multiple successive array elements?

Question

I am working on an old implementation of AES I coded a few years ago, and I wanted to modify my ShiftRows function which is very inefficient.

For moment my ShiftRows basically just swaps value of successive array element (represented by one byte) n times to effectuate a cyclic permutation.

I wondered if it was possible to take my array of element and cast it as a single variable to do the permuatation using the bit shift operator? The rows are 4 unsigned char, so 4 bytes each.

In the following code only the first byte (corresponding to 'a') seems to be affected by the bitshift.

char array[4][4] = {"abcd", "efgh", "ijkl", "mnop"};

int32_t somevar;

somevar = (int32_t)*array[0] >> 16;

It's been a long time since I didn't practice C so I am probably doing some stupid errors.

How big is the row? Will it fit in a simple integer type (64-bit or 128-bit, or what)? If it won't fit in a simple integer, you may do better with bigger types than bytes, but not necessarily much better. — Jonathan Leffler, Sep 22 '18 at 17:23
the state is a 4 by 4 array of unsigned char, I just need to shift 1 row at a time so a 4 byte array. — Balocre, Sep 22 '18 at 17:52
so I tried casting it as an int32_t but as expected only the first element of the array is casted as such and modified by the bitshift — Balocre, Sep 22 '18 at 18:13
Posting the true code is more informative than [describing](https://stackoverflow.com/questions/52459069/is-it-possible-to-do-bitwise-operation-on-multiple-succesive-array-element#comment91860835_52459069). the code. — chux - Reinstate Monica, Sep 22 '18 at 18:14
@JonathanLeffler: [AES ShiftRows](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard#The_ShiftRows_step) rotates each four-byte row `i` by `i` bytes. — Eric Postpischil, Sep 22 '18 at 19:58

Eric Postpischil · Accepted Answer · 2018-09-22T19:55:03.347

First, if your primary goal is a fast AES implementation, rather than either practicing C or a fast-but-portable AES implementation (that is, portability is primary and efficiency is secondary), then you would need to write in assembly language, not C, or at least use compiler features for specific targets that let you write near-assembly code. For example, Intel processors have AES-assist instructions, and GCC has built-in functions for them.

Second, if you are going to do this in C, your primary job, ideally, is to express the desired operations clearly to the compiler. By this, I mean you want the operations to be transparent to the compiler so that its optimizer can work. Using various techniques to reinterpret data (from char to int, for example) can block the compiler’s ability to optimize. (Or they might not, depending on compiler quality and the specific code you write.)

If you are aiming for portable code, it is likely better to simply write the character motions you want (just write simple assignment statements that move array elements). Good compilers can translate these efficiently, even combining multiple byte-move operations into single word-move operations if the hardware supports it.

When you are writing “fancy” code to try to optimize, it is important to be aware of rules of standard C, properties of the compiler(s) you are working with, and the hardware you are targeting.

For example, you have char array[4][4]. This declares an array with no particular alignment. The compiler might put this array anywhere, with any alignment—it is not necessarily aligned to a multiple of four bytes, for example. If you then take a pointer to the first row of this array and convert it to a pointer to an int, then an instruction to load an int may fail on some processors because they require int objects to be aligned to multiples of four bytes. On other processors, the load may work but be slower than an aligned load.

One solution for this is not to declare a bare array and not to convert pointers. Instead, you would declare a union, one member of which might be an array of four uint32_t and the other of which might be an array of four arrays of four uint8_t. The presence of the uint32_t array in the union would compel the compiler to align it suitably for the hardware. Additionally, reinterpreting data through unions is allowed in C, whereas reinterpreting data through converted pointers is not proper C code. (Even if the alignment requirements are satisfied, reinterpreting data through pointers generally violates aliasing rules.)

On another note, it is generally preferable to use unsigned types when working with bits as is done in cryptographic code. Instead of char and int32_t, you may be better off with uint8_t and uint32_t.

Regarding your specific code:

somevar = (int32_t)*array[0] >> 16;

array[0] is the first row of array. By the rules of C, it is automatically converted to a pointer to its first element, so it becomes &array[0][0]. Then *array[0] is *&array[0][0], which is array[0][0], which is the first char in the first row of the array. So the expression so far is just the value of the first char. Then the cast (int32_t) converts the type of the expression to int32_t. This does not change the value, so the result is simply the value of the first char in the first row.

What you were likely thinking of was either * (uint32_t *) &array[0] or * (uint32_t) array[0]. These take either the address of the first row (the former expression) or the address of the first element of the first row (the latter expression) (these denote the same location but are different types) and convert it to a pointer to a uint32_t. Then the * is intended to fetch the uint32_t at that address. That violates C rules and should be avoided.

Instead, you can use:

union
{
    uint32_t words[4];
    uint8_t  bytes[4][4];
} block;

Then you can access individual bytes with block.bytes[i][j] or words of four bytes with block.words[i]. Whether this is a good idea or not depends on context and goals.

Thanks for your answer, rereading my post I am sorry if may have sounded quite pretentious as I am not pretending to be able or even be looking to implement a fast version of an AES. I am just going trough an old project and trying to dust it off bit and practice C, but the way I coded my ShiftRows seemed VERY inefficient to me. I'll take a look at unions, that's something I have never used before so I am not familiar with it. — Balocre, Sep 22 '18 at 20:21
And, I am not sure to understand what you mean by this : "Additionally, reinterpreting data through unions is allowed in C, whereas reinterpreting data through converted pointers is not proper C code." Would you have any resources about that point? — Balocre, Sep 22 '18 at 20:28
@SuperTotoGo: In C, `Type0 x = value; Type1 y = * (Type1 *) &x;` is generally improper. Avoid converting pointers of one type to pointers of another type. To reinterpret the bytes of one type` as if they were another type, either put them in a union as I showed, or copy the bytes with `memcpy`, as in `Type1 y; memcpy(&y, &x, sizeof y);`. The sizes of `x` and `y` must be the same, and the result of reinterpreting depends on the C implementation. For further information search Stack Overflow for “[C] memcpy aliasing”. — Eric Postpischil, Sep 22 '18 at 20:42
once again thanks, I have been messing around with unions and it is so much more easy to represent my state with those! — Balocre, Sep 22 '18 at 21:05

Is it possible to do bitwise operation on multiple successive array elements?

1 Answers1