0

I would like to get the fastest solution to write data in a 2D numpy array using an array of indexes.

I have a large 2D boolean numpy array buffer

import numpy as np

n_rows = 100000
n_cols = 250
shape_buf = (n_rows, n_cols)

row_indexes = np.arange(n_rows,dtype=np.uint32)
w_idx = np.random.randint(n_cols, size=n_rows, dtype = np.uint32)

buffer = np.full(shape=shape_buf,
                 fill_value=0,
                 dtype=np.bool_,order="C")

I want to write data in the buffer using a list of indexes w_idx

data = np.random.randint(0,2, size=n_rows, dtype = np.bool_)
w_idx = np.random.randint(n_cols, size=n_rows, dtype = np.uint32)

One solution is to use standard indexing :

%timeit buffer[row_indexes, w_idx] = data
2.07 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

A faster solution is to flatten the indexes and to use np.put :

%timeit buffer.put(flat_row_indexes + w_idx, data, "wrap")
1.76 ms ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

However this last solution is still to slow for my application. Is it possible to be faster ? Maybe by using another library, say Numba ?

Wang Lee
  • 19
  • 4
  • 2
    I don't think the two solutions you benchmarked do the same thing. – Mad Physicist Nov 13 '22 at 15:20
  • 1
    Check the result of the first timeit expression on a 5x5 array, or something similar that you can visualize in a manageable way – Mad Physicist Nov 13 '22 at 15:21
  • `w_idx` can have duplicates. If so I am wondering if the resulting is well defined (two different value are written to the same location). – Jérôme Richard Nov 13 '22 at 15:29
  • 1
    @JérômeRichard. w_idx is a column location. Each column is generated for a different row. The first index is a mistake. – Mad Physicist Nov 13 '22 at 15:31
  • 1
    @Mad Physicist Thank you, you're right, I made a mistake for the first solution. I will update the post. – Wang Lee Nov 13 '22 at 15:45
  • Simply accessing those elements isn't much faster than assigning them. `%timeit buffer[row_indexes, w_idx]` – hpaulj Nov 13 '22 at 17:04
  • Ok. Regarding Numba, it is worth a try if the indices looks random (at least hardly predictable for the processor) and you do that in parallel (which means the operation should be sufficiently slow in the first place). – Jérôme Richard Nov 13 '22 at 20:14

1 Answers1

2

The timing result you got makes sense given the fact that the first assignment fills in all 800 rows for each column, while the second one actually places the individual elements you want into the array. The reason the first version appears to be ~100x faster instead of ~800x faster is that the overhead of a call to put for such a small dataset is going to overwhelm the timing result.

First lesson: always test numpy operations on a small array that you can see. Usually no larger than 5x5, so you can compare to a hand-calculated version if necessary.

Second lesson: benchmarks on small arrays are unreliable. An O(n) algorithm only achieves linear scaling asymptotically. Timing for small arrays is dominated by function calls and other (usually constant-time) bookkeeping, especially in python.

The best thing I can think of is to avoid the overhead of calling put:

buffer[np.arange(n_rows), w_idx] = data

Another option, since you already have the offsets pre-computed, and all your arrays are contiguous, is to assign linear indices:

buffer.ravel()[w_idx + flat_row_indices] = data
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Thank you for your answer, I made indeed a mistake. However, the solution with np.put is still faster. Are there any other alternatives ? – Wang Lee Nov 13 '22 at 15:56
  • @WangLee. You can try the raveled assignment I added. Also, try playing with different modes besides "wrap". That one sounds expensive – Mad Physicist Nov 13 '22 at 16:04
  • But if there's no faster way, there's no faster way. You can use C, cython, numba, etc at that point. Remember that python is for prototyping – Mad Physicist Nov 13 '22 at 16:05