4

I have an 3D array that is a mask. Additionally, I have some indices that encode where (array position) some values should be saved.

Everything seems to be working fine, except that the output matrix is still empty after assigning the values to the desired positions.

I can not see what I am missing here. I have also tried numpy.put with no luck.

import numpy as np

# Initialize output matrix (here the results will be stored)
results = np.zeros((67, 67, 45))

# define the mask - where to put the calculated values in the results array
mask = np.random.randint(2, size=(67, 67, 45)).astype(bool)

# store the results only in these positions
index_keep = range(0, 13732)

values = np.ones((13732,))

results[mask][index_keep] = values.copy()

# the results array is still empty
print(results.sum())
#0
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • 1
    `results[mask][index_keep]` is a copy of the array,which is why you're not seeing any changes in `result` itself – yatu Jun 26 '19 at 13:08
  • really? in other cases `results[mask] = values.copy()` has worked for me however, `results[mask][index_keep] = values.copy()` fails. How can I do this then? – seralouk Jun 26 '19 at 13:10

2 Answers2

7

When you index an array with a boolean mask, the elements are extracted and placed into a 1-D array. This pretty much had to be the case, since the selected elements of the mask are not evenly space across our within any dimension. The expression results[mask] = value is equivalent to results.__setitem__(mask, value): clearly an in-place modification on result. However results[mask][index_keep] = value is equivalent to result.__getitem__(mask).__setitem__(index_keep, value). The in-place operation happens on a temporary array that is completely discarded.

The solution is to play with the index to get a single call to __setitem__ on the object you want. One way to do that is to apply index_keep to mask. You would first have to convert mask to linear indices, e.g. with np.flatnonzero:

result.ravel()[np.flatnonzero(mask)[index_keep]] = value

This will work as long as ravel returns a view, which it should in most cases. If result is a contiguous array, this will work all the time. It wont work if result is already a subset of a larger array.

This approach has the advantage that it uses only a single index array, and it works for any number of dimensions. Using np.where could be adapted to do the same, but would require more temporary storage. The disadvantage is of course that this approach is limited to contiguous arrays.

P.S. You almost certainly don't need to copy value. Its elements won't be modified, and the assignment will already make the copy into the appropriate locations of result. Making a copy just creates a needless temporary array that will be discarded immediately.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Amazing. Thank you. I am accepting the answer. I was aware of the copy that I return but could not find a way to use these double indexing approach to assign my values – seralouk Jun 26 '19 at 13:34
  • TIL that `ravel` returns a view, mostly (-: – piRSquared Jun 26 '19 at 13:37
  • Thanks for the detailed answer. It's the first time I come across suck a thing. – seralouk Jun 26 '19 at 13:40
  • 1
    @piRSquared. Yep. It makes a best effort to get a pointer to the buffer and just set the strides appropriately. I'm not sure how agressive it is with slices though. For example, `np.arange(12).reshape(4, 3)[:, ::2].ravel()` can't be a view, but `np.arange(12).reshape(3, 4)[:, ::2]` can. I'm just not sure it will be. – Mad Physicist Jun 26 '19 at 13:46
  • 2
    As far as I know `ravel` guarantees a contiguous result, so it will only give you a view if the array is contiguous to begin with. You can always check `arr.ravel().base is None` vs `arr.ravel().base is arr` for a copy vs a view. As far as the answer goes, a key component is that boolean index arrays always trigger fancy indexing, which creates a copy. The double indexing assignment would've worked if the first subscription had used slices, for instance. This might not be obvious to other readers. (cc @piRSquared) – Andras Deak -- Слава Україні Jun 26 '19 at 13:58
  • It's possible that [`result.flat`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flat.html#numpy.ndarray.flat) can be used for this subscription in the general case. – Andras Deak -- Слава Україні Jun 26 '19 at 14:06
  • For a toy example both posted solutions work. For high dimensional data it seems that the 2 approaches lead to different results. You have already answered my question but please have a look at this notebook here (data included): https://drive.google.com/open?id=1-e4uyy8aYHmPRLDZ95mWSorh9eqZQEiU – seralouk Jun 26 '19 at 14:36
  • @AndrasDeak sorry. try here: https://filebin.net/06arajbi21udluqs I appreciate any additional help. Have a look at this strange behaviour – seralouk Jun 26 '19 at 14:51
  • 1
    @serafeim. Does my last comment to the other answer help? – Mad Physicist Jun 26 '19 at 14:54
6

You can use numpy.where on your mask, which will allow you to get a view of your results array to index.

x, y, z = np.where(mask)

results[x[index_keep], y[index_keep], z[index_keep]] = values
user3483203
  • 50,081
  • 9
  • 65
  • 94