To my surprise I have discovered, that reading from and writing to NumPy Structured arrays seems to be linear in size of the array.
As this seems very wrong, I would like to know, if I do something wrong here or if there might be a bug.
Here is some example code:
def test():
A = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,100))])
B = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,10000))])
C = [{'a':0, 'b':[0 for i in xrange(100)]}]
D = [{'a':0, 'b':[0 for i in xrange(10000)]}]
for i in range(100):
A[0]['a'] = 1
B[0]['a'] = 1
B['a'][0] = 1
x = A[0]['a']
x = B[0]['a']
C[0]['a'] = 1
D[0]['a'] = 1
Line Profiling gives the following results:
Total time: 5.28901 s, Timer unit: 1e-06 s
Function: test at line 454
Line # Hits Time Per Hit % Time Line Contents
==============================================================
454 @profile
455 def test():
456
457 1 10 10.0 0.0 A = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,100))])
458 1 13 13.0 0.0 B = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,10000))])
459
460 101 39 0.4 0.0 C = [{'a':0, 'b':[0 for i in xrange(100)]}]
461 10001 3496 0.3 0.1 D = [{'a':0, 'b':[0 for i in xrange(10000)]}]
462
463 101 54 0.5 0.0 for i in range(100):
464 100 20739 207.4 0.4 A[0]['a'] = 1
465 100 1741699 17417.0 32.9 B[0]['a'] = 1
466
467 100 1742374 17423.7 32.9 B['a'][0] = 1
468 100 20750 207.5 0.4 x = A[0]['a']
469 100 1759634 17596.3 33.3 x = B[0]['a']
470
471 100 123 1.2 0.0 C[0]['a'] = 1
472 100 76 0.8 0.0 D[0]['a'] = 1
As you can see, I don't even access the larger array (although a size of 10.000 is actually really tiny..). BTW: Same behavior for shape=(10000,1) instead of (1,10000).
Any Ideas?
Interpreting a structured array as a list of dicts, and comparing to built-in functions, there is the expected computational cost independent of size (see C and D)
NumPy Ver. 1.10.1.