2

I'm trying to implement AES cryptography using the AES machine instructions (basing it on Intel's white paper) available on my Sandy Bridge. Unfortunately, I've come to a halt in the phase of generating the round keys for decryption. Specifically, the instruction aesimc (applying the Inverse Mix Columns operation) returns an incorrect result.

In their paper they have an example: enter image description here

So with input:

48 69 28 53 68 61 79 29 5B 47 75 65 72 6F 6E 5D 

I get the following using _mm_aesimc_si128():

2D BF F9 31 99 CD 3A 37 B7 C7 81 FD 7D E0 3D 8E

It should have returned:

62 7A 6F 66 44 B1 09 C8 2B 18 33 0A 81 C3 B3 E5

Not the same result. Why is this the case?

If you want to reproduce it, I tested it with the code below (remember the arguments -maes -msse4 when compiling):

#include <wmmintrin.h>
#include <iostream>
using namespace std;

void print_m128i(__m128i data) {
  unsigned char *ptr = (unsigned char*) &data;
  for (int i = 0; i < 16; i++) {
    int val = (int) ptr[i];
    if (val < 0xF) {
      cout << "0";
    }    
    cout << uppercase << hex << val << " ";
  }
  cout << endl;
}

int main() {
  unsigned char *data = (unsigned char*)
    "\x48\x69\x28\x53\x68\x61\x79\x29\x5B\x47\x75\x65\x72\x6F\x6E\x5D";
  __m128i num = _mm_loadu_si128((__m128i*) data);
  __m128i num2 = _mm_aesimc_si128(num);
  print_m128i(num2);
  return 0;
}

EDIT: The example in Intel's white paper was wrong. As Hans suggested, my chip is little-endian so byte-swapping is necessary - to and fro.

Morten Kristensen
  • 7,412
  • 4
  • 32
  • 52
  • Because that looks like a C-string, I assume the compiler includes a trailing `'\0'` ascii NUL in `data`. Does `__mm_loadu_si128()` read exactly the correct length? (I expect it does, but I've been wrong when making assumptions before. ;) Furthermore, the string pointed to by `data` may be 4-aligned or 8-aligned, but these instructions _feel_ like they require 16-aligned memory. – sarnold Mar 14 '11 at 01:36
  • Good point. :) `_mm_loadu_si128()` doesn't require a 16-aligned address (hence the 'u'), `_mm_load_si128` does however. It reads exactly 128 bits. But I tried aligning it anyway with `unsigned char *data __attribute__((aligned(16))) = ..` and it didn't change the result unfortunately. – Morten Kristensen Mar 14 '11 at 01:45
  • darn -- thanks for the quick feedback. :) – sarnold Mar 14 '11 at 01:51
  • 1
    Their example might be wrong, see if this works: InvMixColumns (8dcab9dc035006bc8f57161e00cafd8d) = d635a667928b5eaeeec9cc3bc55f5777 – Guy Sirton Mar 14 '11 at 04:24
  • Oh, if I take your input vector and reverse it (little endian): 8D FD CA 00 1E 16 57 8F BC 06 50 03 DC B9 CA 8D, it yields 77 57 5F C5 3B CC C9 EE AE 5E 8B 92 67 A6 35 D6. But reversing that actually gives D6 35 A6 67 92 8B 5E AE EE C9 CC 3B C5 5F 57 77. – Morten Kristensen Mar 14 '11 at 10:12

1 Answers1

2

The bytes are backwards. You want 0x5d to be the least significant byte, it has to come first. This is a little-endian chip. In VS, use Debug + Windows + Registers, right-click + tick SSE to see the register values.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Interesting point, I forgot about endianness. So you mean I should use `"\x5D\x6E\x6F\x72\x65\x75\x47\x5B\x29\x79\x61\x68\x53\x28\x69\x48"` as `data` instead? If so I get the result: B6 59 D8 19 A1 C5 9B F3 15 AA EF 09 F1 8D 7F 59. And that isn't the correct result either. – Morten Kristensen Mar 14 '11 at 02:02
  • Well, that's why they call it cryptography. Good luck. – Hans Passant Mar 14 '11 at 02:09
  • Thank you, it's quite weird. I've inspected the values of the registers `xmm0` and `xmm1` (in GDB) with `0x5D` both as the least and most significant byte (along with the rest of the data, of course). Well, both are wrong for some reason. – Morten Kristensen Mar 14 '11 at 02:31
  • your answer is correct. I forgot to swap the byte-order back after applying the operation (on the byte-swapped input). – Morten Kristensen Mar 14 '11 at 11:18