5

I am trying to implement AES-256 in CTR mode using nVidia CUDA. I have successfully coded CPU code for key expansion and now I need to implement the actual AES-256 algorithm. According to Wikipedia, some codes I've seen and particularly this PDF (page 9), AES rounds can be implemented as series of table lookups. My question is how do I generate these tables? I am aware that I need 4 KB to store these tables, and that is not a problem. I have spent whole day trying to find these tables with no success. The PDF I posted a link to mentions lookup tables T0, T1, T2 and T3, but I do not know what these are. It also mentions round keys 4, 5, 6 and 7, but I also do not understand what these indices are referring to.

The closest I have come to figuring out how to generate these lookup tables is from this project. Inside the code, there is a comment that says:

Te0[x] = S [x].[02, 01, 01, 03];
Te1[x] = S [x].[03, 02, 01, 01];
Te2[x] = S [x].[01, 03, 02, 01];
Te3[x] = S [x].[01, 01, 03, 02];

However, I'm not entirely sure I know what that notation means (is it a matrix multiplication or something else?). The only thing I recognize is the mix-column part constant matrix, as well as the S-box matrix.

[Edit] Now that someone pointed it out - how can a lookup implementation be actually slower? Would it be wise to implement AES without lookup tables here?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Momonga
  • 1,843
  • 2
  • 15
  • 13
  • I think if you look hard, it is really only the S-boxes that can be implemented with a table lookup. – trumpetlicks Feb 26 '13 at 16:53
  • Are you sure? I opened up the aforementioned implementation and it relies entirely on lookup tables. There is even one project that uses **only one** lookup table for all operations. I don't really need that, I need one lookup table per operation. It's definitely possible, I just need to find out how. – Momonga Feb 26 '13 at 17:02
  • 1
    You do realize that what you are speaking about would be a table that is 2^(256+128+128)*128 bits large. 256 for the key, 128 for data, and another 128 for the CTR. This is full input to output lookup (and it is what makes AES so difficult to reverse). While not in a place I can read your reference doc at the moment, it can rely 100% on lookups and still only have the lookups for the SBOXes. An SBox stands for Substitution, which by definition would be a lookup table. – trumpetlicks Feb 26 '13 at 17:26
  • 1
    It seems like you do not understand my question. My apologies if I was confusing. What I want to make is not a table that works like t[message,key,iv] = encryption. I want to implement lookup tables to speed up round operations (definitely possible; please check the slide in the PDF file I linked once you can to get a clearer picture of my question). – Momonga Feb 26 '13 at 17:42
  • Interesting. Using tables may not be faster on the GPU though. – Roger Dahl Feb 27 '13 at 15:15

2 Answers2

2

The T tables are a straightforward description of the AES round transformation in matrix form. To build them, see the original Rijndael NIST proposal, section 5.2.1.

Ben Buhrow
  • 21
  • 1
1

In case anyone is still interested, these lookup tables can be found in the standard library of the Go programming language - http://golang.org/src/crypto/aes/const.go#L80

There are also instructions on how to generate the tables in the test files of the same package.

nindalf
  • 1,058
  • 2
  • 12
  • 15