How Big Does a Numpy Array Have to Be to Benefit from the Efficiency of It over a Python List

Question

I hear all the time that Numpy arrays are quicker for dealing with large amounts of data. But just how much data does a Numpy array need to have for it to supersede the efficiency of a standard Python array (technically list)?

Thanks.

This will probably depend on the machine your running it on and the compiler used to compile things, as well as the specific python version used to run the program and the exact program you're running. I.e. It's pretty much impossible to answer. Make a benchmark program and measure it yourself. — Clearer, Jul 26 '17 at 13:36
You could use `timeit` and compare array operations on collections of `10**n` elements. `numpy` is not only fast, but it also uses less memory than corresponding lists and offer a more concise syntax for many operations. — Eric Duminil, Jul 26 '17 at 13:38
@Clearer Compiler? Python is an interpreted language, you probably know this and just messed up but I'm just pointing it out. :) — Elliot Killick, Jul 26 '17 at 13:47
@Halp: he means the compiler that was used to compile Python and Numpy. — RemcoGerlich, Jul 26 '17 at 13:48
Possibly [related](https://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists) — Alexander Ejbekov, Jul 26 '17 at 13:48
@RemcoGerlich I plan to be just performing operations on them like finding the mean or looping through them to put 'em through a function. Not any appending to the array or anything. — Elliot Killick, Jul 26 '17 at 13:49
@Halp one of the main advantages to `numpy` is vectorising operations, that is instead of looping through a list to "put them through a function" as you say, you can just pass the entire `ndarray` into the function and process it all at once (assuming your function is vectorised). — Tom Wyllie, Jul 26 '17 at 14:06
If starting with a list, then creating the array is the main overhead. Once that's done, most array operations are faster. If starting with an array, then resorting to list like iteration (such as to use a scalar function) is a big slow down. — hpaulj, Jul 26 '17 at 15:57

score 0 · Answer 1 · answered Jul 26 '17 at 13:52

For instance, if you want to set all elements to 1, then numpy is faster for me at 10 elements, maybe earlier I didn't check:

>>> import timeit
>>> timeit.timeit('for i in r: a[i] = 1', setup='a = [0]*10; r=range(len(a))')
0.3777730464935303
>>> timeit.timeit('b[:] = 1', setup='import numpy; b=numpy.array([0]*10)')
0.3234739303588867

With 1000 elements, the first is about 100 times as slow, the second only about 2 times.

But it all depends on what you need to do. If you can avoid for loops by using numpy-isms (like assinging to b[:]) then numpy is blazing fast; if you have to use a for loop, then it won't help much.

How Big Does a Numpy Array Have to Be to Benefit from the Efficiency of It over a Python List

1 Answers1