I am looking at re-implementing the SlowAES code (http://anh.cs.luc.edu/331/code/aes.py) to try and take advantage of the native array support of numpy. I'm getting what, to me, is the counter-intuitive result that the pure Python of SlowAES is much, much faster than the same functions implemented using numpy. Here is the clearest example I have.
One of the main operations in AES is Shift Rows, where each row in the 4x4 element byte array is shifted by some number of positions (0 for row 0, 1 for row 1, etc.). The original Python code treats this 4x4 byte state array as a one dimensional 16-element list, then uses slicing to create virtual rows to rotate:
def rotate(word, n):
return word[n:] + word [0:n]
def shiftRows(state):
for i in range(4):
state[i*4:i*4+4] = rotate(state[i*4:i*4+4], -i)
Running timeit on shiftRows using a list of 16 integers results in a time of 3.47 microseconds.
Re-implementing this same function in numpy, assuming a 4x4 integer input array, would be simply:
def shiftRows(state):
for i in range(4):
state[i] = np.roll(state[i],-i)
However, timeit shows this to have an execution time of 16.3 microseconds.
I was hoping numpy's optimized array operations might result in somewhat faster code. Where am I going wrong? And is there some approach that would result in a faster AES implementation than pure Python? There are some intermediate results that I want to get at, so pycrypto may not be applicable (though if this is going to be too slow, I may have to take a second look).
07 Sep 2016 - Thanks for the answers. To answer the question of "why," I'm looking at running hundreds of thousands, if not millions, of sample plaintext/ciphertext pairs. So, while the time difference for any single encryption makes little difference, any time savings I can get could make a huge difference in the long run.