When do numpy arrays become more efficient than python lists for access operations? (considering number of elements as variable)

Question

I've recently had to implement a simple bruteforce software in python, and I was getting terrible execution times (even for a O(n^2) time complexity), topping the 10 minutes of runtime for a total of 3700 * 11125 * 2 = 82325000 access operations on numpy arrays (intel i5 4300U).

I'm talking about access operations because I initialized all the arrays beforehand (out of the bruteforce loops) and then I just recycle them by just overwriting without reallocating anything.

I get that bruteforce algorithms are supposed to be very slow, but 82 million acesses on contiguous-memory float arrays should not take 10 minutes even on the oldest of cpus...

After some research I found this post: Why is numpy.array so slow?, which lead me to think that maybe numpy arrays with length of 3700 are too small to overcome some sort of overhead (which I do not know since, as I said, I recycle and not reallocate my arrays), therefore I tried substituting arrays with lists and... voilà, run times were down to the minute (55 seconds) for a total of 10x decrease!

But now I'm kinda baffled on why on earth a list can be faster than an array, but more importantly, where is the threshold? The cited post said that numpy arrays are very efficient on large quantities of data, but where does "a lot of data" becomes "enough data" to actually get advantages? Is there a safe way to determine whether to use arrays or lists? Should I get worried about writing my functions two times, one for each case?

For anyone wondering, the original script was something like (list version, mockup data):

X = [0] * 3700 #list of 3700 elements
y = [0] * 3700

combinations = [(0, 0)] * 11125

grg_results = [[0, 0, 0]] * len(combinations)
rgr_results = [[0, 0, 0]] * len(combinations)

grg_temp = [100] * (3700 + 1)
rgr_temp = [100] * (3700 + 1)

for comb in range(len(combinations)):
   pivot_a = combinations[comb][0]
   pivot_b = combinations[comb][1]
   for i in range(len(X)):
       _x = X[i][0]
       _y = y[i][0]
       if _x < pivot_a:
           grg_temp[i + 1] = _y * grg_temp[i]
           rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
       elif _x >= pivot_a and _x <= pivot_b:
           grg_temp[i + 1] = (2 - _y) * grg_temp[i]
           rgr_temp[i + 1] = _y * rgr_temp[i]
       else:
           grg_temp[i + 1] = _y * grg_temp[i]
           rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
   grg_results[comb][0] = pivot_a
   grg_results[comb][1] = pivot_b
   rgr_results[comb][0] = pivot_a
   rgr_results[comb][1] = pivot_b
   grg_results[comb][2] = metrics[0](grg_temp)
   rgr_results[comb][2] = metrics[0](rgr_temp)

`numpy` arrays are faster when using the compiled methods and functions - that is when the iteration and access is done in `c`. Python level iteration on arrays is slower. The elements of an array have to be "unboxed", returning a new object with value copied from the source array. Element access from a list just returns a pointer to the existing object. There are many SO questions about "vectorization" or "remove loops", where the basic answer is: "move the looping into existing numpy methods". — hpaulj, Aug 14 '22 at 02:57
Post your complete benchmark code to run to reproduce your observations. — Maxim Egorushkin, Aug 14 '22 at 02:58

score 3 · Accepted Answer · answered Aug 14 '22 at 03:55

Let's do some simple list and array comparisons.

Make a list of 0s (as you do):

In [108]: timeit [0]*1000
2.83 µs ± 0.399 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Make an array from that list - a lot more time:

In [109]: timeit np.array([0]*1000)
84.9 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Make the array of 0s in the correct numpy way:

In [110]: timeit np.zeros(1000)
735 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Now sum all the elements of the list:

In [111]: %%timeit alist = [0]*1000
     ...: sum(alist)
5.74 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Sum of the array:

In [112]: %%timeit arr = np.zeros(1000)
     ...: arr.sum() 
6.41 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In this case the list is a bit faster; but make a much larger list/array:

In [113]: %%timeit alist = [0]*100000
     ...: sum(alist) 
545 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [114]: %%timeit arr = np.zeros(100000)
     ...: arr.sum()   
56.7 µs ± 37 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The list sum scales O(n); the array scales much better (it has a higher "setup", but lower O(n) dependency). The iteration is done in c code.

Add 1 to all elements of the list:

In [115]: %%timeit alist = [0]*100000
     ...: [i+1 for i in alist] 
4.64 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

A similar list comprehension on the array is slower:

In [116]: %%timeit arr = np.zeros(100000)
     ...: np.array([i+1 for i in arr]) 
24.2 ms ± 37.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But doing a whole-array operation (where it iteration and addition is done in c code):

In [117]: %%timeit arr = np.zeros(100000)
     ...: arr+1
64.1 µs ± 85.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

For smaller size the [115] comprehension will be faster; there's no clear threshold.

score 1 · Answer 2 · answered Aug 14 '22 at 02:26

These are the 4 main advantages of an ndarray as far as i know :

It uses less storage for the pointers (1 byte instead of 8) because its a raw python object and not an array. It also only allows homogeneous numeric data types which also lead to a increase in performance.
Slicing doesnt copy the array (which is a slow opeation) just returns a view on the same array
The same goes for joining and splitting arrays (doesnt copy) and other array operations
Some operations like adding the values of two arrays together are written in c and of course faster as a result

Conclusion :

Manually looping over a ndarray is not fast as there is no optimization implemented. A numpy array offers memory and execution efficiency, and not to forget that the array is serving as an interchange format for many existing libraries.

Sources: https://pythoninformer.com/python-libraries/numpy/advantages/ , https://numpy.org/devdocs/user/absolute_beginners.html

score 0 · Answer 3 · answered Jun 28 '23 at 06:08

Based on some tests I ran for accessing individual array elements one at a time, numpy is about five times slower than using python lists.

As noted in other answers, numpy functions are far faster than looping in python. However, in this case, we are not doing any complicated math operations that require hundreds of array accesses at once. While accessing an individual element in a C array is fast, keep in mind the line must first be interpreted by python. It seems the overhead is larger for numpy arrays than for lists.

However, suppose instead you want to find the sum of an array/list (or any somewhat complex operation). With lists, you would have to loop over them and access each element. Using a numpy one-liner, numpy will internally access the array hundreds of times in C, and the tiny overhead of calling a numpy function would be insignificant compared to the gains for using an function written in C.

In summary, for accessing individual element one at a time, numpy is slower. However, when comparing an algorithm that loops in python to a numpy function, numpy will be far faster.

When do numpy arrays become more efficient than python lists for access operations? (considering number of elements as variable)

3 Answers3