7

I have an issue with numpy that I can't solve. I have 3D arrays (x,y,z) filled with 0 and 1. For instance, one slice in the z axis :

array([[1, 0, 1, 0, 1, 1, 0, 0],
       [0, 0, 1, 1, 0, 1, 1, 0],
       [1, 0, 1, 1, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 1, 1, 0, 1]])

And I want this result :

array([[1, 1, 1, 1, 1, 1, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 1]])

That is to say, what I want to do for each slice z is to scan line by line right to left and left to right (x axis) and the first time I have a 1 I want to fill the rest of the line with ones.

Is there an efficient way to compute that ?

Thanks a lot.

Nico !

Nico
  • 81
  • 2
  • 4

4 Answers4

4

Accessing NumPy array elements one by one is not very efficient. You may do better with just plain Python lists. They also have an index method which can search for the first entry of the value in the list.

from numpy import *

a = array([[1, 0, 1, 0, 1, 1, 0, 0],
       [0, 0, 1, 1, 0, 1, 1, 0],
       [1, 0, 1, 1, 0, 0, 0, 1],
       [0, 1, 0, 0, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 1, 0, 1]])

def idx_front(ln):
    try:
        return list(ln).index(1)
    except ValueError:
        return len(ln) # an index beyond line end

def idx_back(ln):
    try:
        return len(ln) - list(reversed(ln)).index(1) - 1
    except ValueError:
        return len(ln) # an index beyond line end

ranges = [ (idx_front(ln), idx_back(ln)) for ln in a ]
for ln, (lo,hi) in zip(a, ranges):
    ln[lo:hi] = 1  # attention: destructive update in-place

print "ranges =", ranges
print a

Output:

ranges = [(0, 5), (2, 6), (0, 7), (1, 6), (0, 7), (0, 7), (4, 4), (8, 8), (2, 7)]
[[1 1 1 1 1 1 0 0]
 [0 0 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 0 0]
 [0 0 1 1 1 1 1 1]]
sastanin
  • 40,473
  • 13
  • 103
  • 130
  • Hi Jetxee ! Thanks a lot for your help :) ! Just one detail : If there's a row filled with only zeros, it won't work ! Right ? – Nico Jul 08 '11 at 15:22
  • 2
    Converting Numpy arrays to python lists defeat the purpose of using Numpy! It is very inefficient to iterate over python lists compared to standard Numpy indexing and broadcasting methods. – Paul Jul 08 '11 at 16:41
  • @Paul Search for the first non-zero element is O(N) anyway; and Python loops with element access are faster _for lists_ rather than for NumPy arrays. To update the inner elements, I assigned `1` to entire slice (let's hope NumPy does this effectively). – sastanin Jul 08 '11 at 17:36
  • @Paul On element access: `a = arange(10000) ; b = range(10000);` Then `%timeit for i in xrange(len(a)): a[i]` gives 1.72 ms per loop, and `%timeit for i in xrange(len(b)): b[i]` gives 0.879 ms per loop with Python 2.6.5 and Numpy 1.6.0b2. – sastanin Jul 08 '11 at 17:37
  • @Nico Valid point. I updated the example; now it works also for rows filled with only zeros. – sastanin Jul 08 '11 at 17:46
  • Care to show also the result of the actual run of an array with a row where all its elements equals to 0. Thanks – eat Jul 08 '11 at 18:04
4

Actually, this is a basic binary image morphology operation.

You can do it in one step for the entire 3D array using scipy.ndimage.morphology.binary_fill_holes

You just need a slightly different structure element. In a nutshell, you want a structuring element that looks like this for the 2D case:

[[0, 0, 0],
 [1, 1, 1],
 [0, 0, 0]]

Here's a quick example:

import numpy as np
import scipy.ndimage as ndimage

a = np.array( [[1, 0, 1, 0, 1, 1, 0, 0],
               [0, 0, 1, 1, 0, 1, 1, 0],
               [1, 0, 1, 1, 0, 0, 0, 1],
               [0, 1, 0, 0, 1, 0, 1, 0],
               [1, 1, 1, 0, 1, 0, 0, 1],
               [1, 0, 0, 0, 0, 1, 0, 1],
               [0, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 1, 0, 0, 0],
               [0, 0, 1, 0, 1, 1, 0, 1]])
structure = np.zeros((3,3), dtype=np.int)
structure[1,:] = 1

filled = ndimage.morphology.binary_fill_holes(a, structure)
print filled.astype(np.int)

This yields:

[[1 1 1 1 1 1 0 0]
 [0 0 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0]
 [0 0 1 1 1 1 1 1]]

The real advantage to this (Other than speed... It will be much faster and more memory efficient than using lists!) is that it will work just as well for 3D, 4D, 5D, etc arrays.

We just need to adjust the structuring element to match the number of dimensions.

import numpy as np
import scipy.ndimage as ndimage

# Generate some random 3D data to match what we want...
x = (np.random.random((10,10,20)) + 0.5).astype(np.int)

# Make the structure (I'm assuming that "z" is the _last_ dimension!)
structure = np.zeros((3,3,3))
structure[1,:,1] = 1

filled = ndimage.morphology.binary_fill_holes(x, structure)

print x[:,:,5]
print filled[:,:,5].astype(np.int)

Here's a slice from the random input 3D array:

[[1 0 1 0 1 1 0 1 0 0]
 [1 0 1 1 0 1 0 1 0 0]
 [1 0 0 1 0 1 1 1 1 0]
 [0 0 0 1 1 0 1 0 0 0]
 [1 0 1 0 1 0 0 1 1 0]
 [1 0 1 1 0 1 0 0 0 1]
 [0 1 0 1 0 0 1 0 1 0]
 [0 1 1 0 1 0 0 0 0 1]
 [0 0 0 1 1 1 1 1 0 1]
 [1 0 1 1 1 1 0 0 0 1]]

And here's the filled version:

[[1 1 1 1 1 1 1 1 0 0]
 [1 1 1 1 1 1 1 1 0 0]
 [1 1 1 1 1 1 1 1 1 0]
 [0 0 0 1 1 1 1 0 0 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 1 1 0]
 [0 1 1 1 1 1 1 1 1 1]
 [0 0 0 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1]]

The key difference here is that we did this for every slice of the entire 3D array in one step.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
2

After a moments thought, following your description and corner case with all zero rows, this will be still quite straightforward with numpylike:

In []: A
Out[]: 
array([[1, 0, 1, 0, 1, 1, 0, 0],
       [0, 0, 1, 1, 0, 1, 1, 0],
       [1, 0, 1, 1, 0, 0, 0, 1],
       [0, 1, 0, 0, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 1, 1, 0, 1]])

In []: v= 0< A.sum(1) # work only with rows at least one 1
In []: A_v= A[v, :]
In []: (r, s), a= A_v.nonzero(), arange(v.sum())
In []: se= c_[searchsorted(r, a), searchsorted(r, a, side= 'right')- 1]
In []: for k in a: A_v[k, s[se[k, 0]]: s[se[k, 1]]]= 1
   ..: 
In []: A[v, :]= A_v

In []: A
Out[]: 
array([[1, 1, 1, 1, 1, 1, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 1]])

Update:
After some more tinkering, here is a more 'pythonic' implementation and way much simpler, than the above one. So, the following lines:

for k in xrange(A.shape[0]):
    m= A[k].nonzero()[0]
    try: A[k, m[0]: m[-1]]= 1
    except IndexError: continue

are quite straightforward ones. And they'll perform very well, indeed.

eat
  • 7,440
  • 1
  • 19
  • 27
  • Hey eat ! I would also like to thank you, your method works fine too. But still, I'll have the same problem when there will be a row filled with zeros (which shouldn't change after the transformation !) – Nico Jul 08 '11 at 15:30
  • Updated my answer, to take care of that corner case. But how relevant that case is? If it's really typically this could still be fine tuned. BTW, you may like to describe more on your real problem you are trying to solve. There may exists even more straightforward ways to handle it. Thanks – eat Jul 08 '11 at 16:10
  • Good point to use `.nonzero()`, it is slightly faster than `.index()`. `lst = 9999*[0] + [1]; npa=zeros(10000) ; npa[-1] = 1` then `%timeit lst.index(1)` gives 275 μs, and `%timeit npa.nonzero()[0]` gives 153 μs per loop. – sastanin Jul 08 '11 at 17:53
  • On the other hand, `.nonzero()` is forced to find _all_ non-zero elements, `.index()` looks only for the first one. In some situations the latter is faster than the former. Example: `import numpy as np; npa = np.ones(1000); lst = list(npa);`, then `npa.nonzero()[0]` takes 15.9 μs, while `lst.index(1)` takes only 1.27 μs. – sastanin Jul 11 '11 at 09:34
  • @jetxee: On the other hand ... I think @Joe Kingnstons solution is much more suitable for OP. Thanks – eat Jul 11 '11 at 14:25
0

I can't think of a more efficient way than what you describe:

For every line

  1. Scan line from the left until you find a 1.

  2. If no 1 is find continue with next line.

  3. Otherwise scan from the right to find the last 1 in the line.

  4. Fill everything in the current line between the positions from 1. and 3. with 1s.

Howard
  • 38,639
  • 9
  • 64
  • 83