0

I am a CUDA developer, I am assisting undergrad students in implementing AES on GPU. They don't have much knowledge about cryptography also this is the first time I am working on it. I have a few questions if anyone could answer them.

  1. How do we implement the AES, I mean what should be the best DATA to encrypt and show speedup on GPU? Should we encrypt some CD? Or sensitive data such as Credit card numbers? In short, what should be our data source?

  2. What mode of operation of AES in well suitable for GPU?

  3. The students were asked about the Input bit stream rate, would anyone shed light on this one? Cause the way I thought was, it should be dependent on the data size. And that comparison can be drawn between GPU and CPU on the basis of datasize.

Thanks in advance.

Bilal
  • 25
  • 5

2 Answers2

1

I'm not really familiar with GPUs, so can't really answer question 2. However, for the other two:

Point 1. AES doesn't care what you encrypt, it's just bits. Just find a large file, so that you can show statistically significant speedup. You won't be able to prove much about the (possible) speedup of your implementation by encrypting a few bytes. Of course, that's for "functional" speed, depending on how fast your implementation is, you might actually be limited by RAM or disk transfer speeds. So, you might as well just time how long it takes to encrypt data you generate on the fly (say like a counter), thereby ensuring that the data doesn't need to be read from a disk/RAM.

Point 3. AES is a block cipher, the input size is fixed to 128 bits (16 bytes), nothing you can do to change that. The input data rate is going to depend purely on how fast your can process data.

AVH
  • 11,349
  • 4
  • 34
  • 43
  • AES is using 16 bytes chunks true, but usually, when ciphering something big, you can use chaining or not. When using a GPU, you want to parallelize, thus scrambl many chunks at the same time, something you can't do on a PC. To be able to do it, easiest way is to stick to ECB (each 16 bytes scrambled independantly, no chaining) – Bruce Mar 09 '12 at 11:35
  • Thanks for your feedback, so can I use for example a Movie or DVD (Large one) and encrypt and decrypt it for the sake of demonstration. Would it be good enough? – Bilal Mar 09 '12 at 11:38
  • 1
    ECB is a completely unsecure mode and will leak tons of information about your ciphertext, and as such, should NEVER be used. In general, when all you need is confidentiality, CBC mode should be good enough. However, CBC can't be parallelized. Check out GCM mode, which can be parallelized and provides authentication as well as confidentiality. – AVH Mar 09 '12 at 11:41
  • 1
    @Bilal: No, if you use a DVD, probably 99% of the encryption time will be spend reading out the DVD. From what I understand, you want to measure the "true" speed of you implementation, so encryption data that you generate on the fly (such as a 128-bit counter in an SSE register, although there might be better options, the faster you can transfer the data to the GPU, the better). – AVH Mar 09 '12 at 11:43
  • Btw, here is why ECB is not any good: [ECB image on Wikipedia](https://en.wikipedia.org/wiki/ECB_mode#Electronic_codebook_.28ECB.29). – AVH Mar 09 '12 at 11:44
  • umm.. guys, please would you guide me which AES mode can be done on GPU? ECB is not good, then what is the next ideal one for it? – Bilal Mar 09 '12 at 11:56
  • Whatever mode that can be calculated in parallel and is secure should be fine. The most straightforward mode that is very easy to do in parallel is probably CTR mode, you could also look at GCM. – AVH Mar 09 '12 at 12:53
  • I second Darhuuk's recommendation to look into Galois Counter Mode in addition to CTR. For a recent publication on AES implementation with CUDA (with useful references), see: http://ijnc.org/index.php/ijnc/article/view/38. Since AES is byte-oriented, you may be able to find good uses for the device intrinsic __byte_perm(), but I have no firsthand experience using it in that context. – njuffa Mar 09 '12 at 18:08
  • GCM is proposed for both XML encryption and TLS if I'm not mistaken, it would make a good target. CTR would be a good starter as it does not require the Galois part - maybe make GCM an encore. – Maarten Bodewes Mar 10 '12 at 00:32
0
  1. You don't care : usually, something big, like an iso or 600MB of random is good

  2. GPU uses parallelization, so you best stick to ECB. Otherwise, algorithm can't be parallelised

  3. rate is independant of size of data : it's size / processing time. In case of cuda, you have to take into account transfer to device / process / transfer back (memory copy are far from being smalls in term of processing) => unless you can heavily parallelize, you lose time instead of winning.

Bruce
  • 7,094
  • 1
  • 25
  • 42
  • Thank you for your feedback, can't CTR work well in this one? So you mean that we shouldn't be concerned about the rate right? I mean the things we should focus on should be : Data size and the time taken to encrypt the data. Of course as you said, if the data is large enough, the latency would be overpowered. – Bilal Mar 09 '12 at 11:35
  • 1
    CTR would work as well. Anything which allows parallelization. I usually stick to ECB or CBC, thus never think about other modes. Wikipedia would help on the subject. You just need to pay attention that some modes can be parallelized for decryption only – Bruce Mar 09 '12 at 11:40
  • 2
    ECB is no good, no one should ever use it, so proving your implementation is fast at ECB isn't really helpful. There are other modes that are secure and can be parallelized, which also gives you message authentication. See my answer for more info. – AVH Mar 09 '12 at 11:47
  • +1, 2/3 of the answer is good and really didn't deserve a downvote. – Behrooz Mar 25 '13 at 11:30