0

I'm programming in pure Python for 2 years.

Now I am learning Numpy and I am confused.

In tutorials has given examples that Numpy is way more efficient than pure python. Given examples, but when I try for example simple iteration:

import numpy as np
import time
start = time.time()
list = range(1000000)
array = np.arange(1000000)

for element in list:
    pass
print('\n'+str((time.time() - start)*1000)+'\n')
start = time.time()
for element in np.nditer(array, order='F'):
    pass
print('\n'+str((time.time() - start)*1000)+'\n')

I got an output:

87.67843246459961

175.25482177734375

As may be seen upper, iteration over Numpy is way less efficient than pure Python.

My question is: I do not understand and cannot myself explain why to use Numpy, and moreso: when to use it?

Tomasz Wójcik
  • 61
  • 1
  • 1
  • 6
  • 3
    You missed the part that it is way more efficient when vectorizing problems. Also, never iterate on arrays - better convert to a list before. – kabanus Nov 27 '18 at 18:25
  • 2
    In addition to what kabanus said, it's also worth pointing out that you really shouldn't be naming your variables 'list' and 'array'. – John Rouhana Nov 27 '18 at 18:26
  • If you iterate, you'll find numpy arrays are slower. their strength lies in vectorization. – Paritosh Singh Nov 27 '18 at 18:30

2 Answers2

2

Numpy is much faster with vector operations. If you change your code to:

array+=1

instead of:

for element in np.nditer(array, order='F'):
    pass

you can see that numpy vastly outperforms the regular python code

Shay Lempert
  • 311
  • 2
  • 9
2

The strength of numpy is that you don't need to iterate. There's no problem with iterating, and in fact in can be useful in some cases, but the vast majority of problems can be solved using functions is numpy.

Using your examples (with the %timeit command in ipython), if you do something simple like adding a number to every element of the list numpy is clearly much faster when it is used directly without iterating.

import numpy as np
import time
start = time.time()
dlist = range(1000000)
darray = np.arange(1000000)

# Pure python
%timeit [e + 2 for e in dlist]
59.8 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Iterating numpy
%timeit [e + 2 for e in darray]
193 ms ± 8.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Converting numpy to list before iterating
%timeit [e+2 for e in list(darray)]
198 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Numpy
%timeit darray + 2
847 µs ± 8.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Same thing with more complex operations, like finding the mean:

%timeit sum(dlist)/len(dlist)
16.5 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sum(darray)/len(darray)
66.6 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Converting to list then iterating
%timeit sum(list(darray))/len(darray) 
83.1 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Using numpy's methods
%timeit darray.mean()
1.26 ms ± 5.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Numpy is much faster once you understand how to use it. It requires a rather different way of looking at data, and gaining a familiarity with the functions it provides, but once you do it results in simpler and faster code.

user2699
  • 2,927
  • 14
  • 31
  • Is numpy also faster for string data or only for digits? – Tomasz Wójcik Nov 27 '18 at 20:14
  • One more thing: when I am converting python list to arrange, it always taking more time than pure Python. Example: I am getting data from shelve file and want to convert them to numpy, so it will take too much time with conversion. – Tomasz Wójcik Nov 27 '18 at 20:18
  • In general numpy isn't terribly fast with string data, although it can handle it. You have to balance the setup cost for getting data in a format numpy can use versus the number of computations you'll be doing on it. There are some options for loading data directly into numpy arrays from disk which work well. But, as always, look at your particular problem and find what works best. – user2699 Nov 27 '18 at 20:27
  • Last one question: I was learning in last few days also pandas: do you think that in any case numpy can be better than pandas? – Tomasz Wójcik Nov 27 '18 at 20:31
  • I'd say it's better in most cases. `pandas` is built on top of numpy, and only handles a very specific set of problems when you want to work with tabular or indexed data. And even then, using numpy directly is still simpler in many cases. – user2699 Nov 27 '18 at 21:03