-1

Just a short question that I can't find the answer to before i head off for the day,

When i do something like this:

v1 = float_list_python = ... # <some list of floats>
v2 = float_array_NumPy = ... # <some numpy.ndarray of floats>
                             # I guess they don't have to be floats - 
                             # but some object that also has a native 
                             # object in C, so that numpy can just use
                             # that

If i want to multiply these vectors by a scalar, my understanding has always been that the python list is a list of object references, and so looping through the list to do the multiplication must fetch the locations of all the floats, and then must get the floats in order to do it - which is one of the reasons it's slow.

If i do the same thing in NumPy, then, well, i'm not sure what happens. There are a number of things i imagine could happen:

  1. It splits the multpilication up across the cores.
  2. It vectorises the multications (as well?)

The documentation i've found suggests that many of the primitives in numpy take advantage of the first option there whenever they can (i don't have a computer on hand at the moment i can test it on). And my intuition tells me that number 2 should happen whenever it's possible.

So my question is, if I create a NumPy array of python objects, will it still at least perform operations on the list in parallel? I know that if you create an array of objects that have native C types, then it will actually create a contiguous array in memory of the actual objects, and that if you create an numpy array of python objects it will create an array of references, but i don't see why this would rule out parallel operations on said list, and cannot find anywhere that explicitly states that.

EDIT: I feel there's a bit of confusion over what i'm asking. I understand what vectorisation is, I understand that it is a compiler optimisation, and not something you necesarily program in (though aligning the data such that it's contiguous in memory is important). On the grounds of vectorisation, all i wanted to know was whether or not numpy uses it. If i do something like np_array1 * np_array2 does the underlying library call use vectorisation (presuming that dtype is a compatible type).

For the splitting up over the cores, all i mean there, is if i again do something like np_array1 * np_array2, but this time dtype=object: would it divide that work up amongst there cores?

will
  • 10,260
  • 6
  • 46
  • 69

2 Answers2

1

numpy is fast because it performs numeric operations like this in fast compiled C code. In contrast the list operation operates at the interpreted Python level (streamlined as much as possible with Python bytecodes etc).

A numpy array of numeric type stores those numbers in a data buffer. At least in the simple cases this is just a block of bytes that C code can step through efficiently. The array also has shape and strides information that allows multidimensional access.

When you multiply the array by a scalar, it, in effect, calls a C function titled something like 'multiply_array_by_scalar', which does the multiplication in fast compiled code. So this kind of numpy operation is fast (compared to Python list code) regardless of the number of cores or other multi-processing/threading enhancements.

Arrays of objects do not have any special speed advantage (compared to lists), at least not at this time.

Look at my answer to a question about creating an array of arrays, https://stackoverflow.com/a/28284526/901925 I had to use iteration to initialize the values.

Have you done any time experiments? For example, construct an array, say (1000,2). Use tolist() to create an equivalent list of lists. And make a similar array of objects, with each object being a (2,) array or list (how much work did that take?). Now do something simple like len(x) for each of those sub lists.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I understand that numpy is fast because it uses compiled C code, that's not what i'm wandering about. I understand that when it uses c objects, they are contiguous in memory, so it also gains very large performance bonuses due to fewer cache misses, and the prefetcher being able to do it's job propley. – will Feb 02 '15 at 23:16
  • Also, your response to that question about having the impenetrable wall around the arrays i think just comes from not understanding how numpy deals with the slice objects you're sending it. the `ndarray`'s `__getitem__` function is just designed so that it will only accept up to N `slice` objects, where N is the dimension of the `ndarray`, which in your case was 2. That's just how it is. Have a look [here](http://stackoverflow.com/q/27229218/432913). – will Feb 02 '15 at 23:21
  • 1
    In reference to that other thread, `b` is a 2d array. I have to select a single element of `b` in order to do any indexing (or math operation) on the contents of that element. There's no provision for vectorized indexing or math across multiple elements of `b`. That's what I mean by the wall. It is possible to define operations that cross that boundary. `np.char` has a number of operations that work with string dtypes. – hpaulj Feb 02 '15 at 23:55
0

@hpaulj provided a good answer to your question. In general, from reading your question it occurred to me that you do not actually understand what "vectorization" does under the hood. This writeup is a pretty decent explanation of vectorization and how it enables faster computations - http://quantess.net/2013/09/30/vectorization-magic-for-your-computations/

With regards to point 1 - Distributing computations across multiple cores, this is not always the case with Numpy. However, there are libraries like numexpr that enable multithreaded, highly efficient Numpy array computations with support for several basic logical and arithmetic operators. Numexpr can be used to turbo charge critical computations when used in conjunction with Numpy as it avoids replicating large arrays in memory for vectorization routines (as is the case for Numpy) and can use all cores on your system to perform computations.

HamsterHuey
  • 1,223
  • 10
  • 11
  • I understand what vectorisation is, and that it obviously cannot be utilised for objects arrays (i thought that was implied in the question by the "will it still at least perform operations on the list in parallel" part). As for answering the question (your second paragraph), "this is not always the case" - do you know when it specifically is? – will Feb 02 '15 at 23:15
  • It depends on whether numpy/scipy is compiled so as to link to libraries that implement parallelization. For example, Intel has the MKL libraries that enable a speedup in numpy computations if you are using Numpy compiled to run with the MKL libraries. This is a paid option with Anaconda (or free if you are a student). So there isn't a general, all purpose answer. Whether or not individual numpy operations support multithreading/parallelization has to do with your system architecture, and what libraries are available, and finally what libraries your local Numpy install is compiled to use. – HamsterHuey Feb 03 '15 at 05:32
  • More info that maybe useful here: More info: http://wiki.scipy.org/ParallelProgramming – HamsterHuey Feb 03 '15 at 05:33
  • okay, that makes more sense. Thanks. It mentions it on the numpy installation page in the building from source section too, so it looks like anyone can get it for free if they want. – will Feb 03 '15 at 11:57