First, if your primary goal is a fast AES implementation, rather than either practicing C or a fast-but-portable AES implementation (that is, portability is primary and efficiency is secondary), then you would need to write in assembly language, not C, or at least use compiler features for specific targets that let you write near-assembly code. For example, Intel processors have AES-assist instructions, and GCC has built-in functions for them.
Second, if you are going to do this in C, your primary job, ideally, is to express the desired operations clearly to the compiler. By this, I mean you want the operations to be transparent to the compiler so that its optimizer can work. Using various techniques to reinterpret data (from char
to int
, for example) can block the compiler’s ability to optimize. (Or they might not, depending on compiler quality and the specific code you write.)
If you are aiming for portable code, it is likely better to simply write the character motions you want (just write simple assignment statements that move array elements). Good compilers can translate these efficiently, even combining multiple byte-move operations into single word-move operations if the hardware supports it.
When you are writing “fancy” code to try to optimize, it is important to be aware of rules of standard C, properties of the compiler(s) you are working with, and the hardware you are targeting.
For example, you have char array[4][4]
. This declares an array with no particular alignment. The compiler might put this array anywhere, with any alignment—it is not necessarily aligned to a multiple of four bytes, for example. If you then take a pointer to the first row of this array and convert it to a pointer to an int
, then an instruction to load an int
may fail on some processors because they require int
objects to be aligned to multiples of four bytes. On other processors, the load may work but be slower than an aligned load.
One solution for this is not to declare a bare array and not to convert pointers. Instead, you would declare a union, one member of which might be an array of four uint32_t
and the other of which might be an array of four arrays of four uint8_t
. The presence of the uint32_t
array in the union would compel the compiler to align it suitably for the hardware. Additionally, reinterpreting data through unions is allowed in C, whereas reinterpreting data through converted pointers is not proper C code. (Even if the alignment requirements are satisfied, reinterpreting data through pointers generally violates aliasing rules.)
On another note, it is generally preferable to use unsigned types when working with bits as is done in cryptographic code. Instead of char
and int32_t
, you may be better off with uint8_t
and uint32_t
.
Regarding your specific code:
somevar = (int32_t)*array[0] >> 16;
array[0]
is the first row of array
. By the rules of C, it is automatically converted to a pointer to its first element, so it becomes &array[0][0]
. Then *array[0]
is *&array[0][0]
, which is array[0][0]
, which is the first char
in the first row of the array. So the expression so far is just the value of the first char
. Then the cast (int32_t)
converts the type of the expression to int32_t
. This does not change the value, so the result is simply the value of the first char
in the first row.
What you were likely thinking of was either * (uint32_t *) &array[0]
or * (uint32_t) array[0]
. These take either the address of the first row (the former expression) or the address of the first element of the first row (the latter expression) (these denote the same location but are different types) and convert it to a pointer to a uint32_t
. Then the *
is intended to fetch the uint32_t
at that address. That violates C rules and should be avoided.
Instead, you can use:
union
{
uint32_t words[4];
uint8_t bytes[4][4];
} block;
Then you can access individual bytes with block.bytes[i][j]
or words of four bytes with block.words[i]
. Whether this is a good idea or not depends on context and goals.