0

I try to save every single connected component out of a big 3D-array into separate arrays. These separate arrays, one per connected component, I want to save as ".npy" files. For the connected components I use cc3d.connected_components() (https://github.com/seung-lab/connected-components-3d/). The output is an 3D array containing integers from 0 to approx. 1500, one for every component. So I'll need to save ~1500 arrays. (Since I can't give you the original data, I'll use an array containing random integers with the same shape as the output from the connected components analysis.)

My approach was to define a function bound_box(i) which creates an array, that contains the bounding box around a single connected component and saves it using np.save(). And then to loop this function over all connected components in the primary 3D array.

def bound_box(i):
   component = np.nonzero(arr == i)
   z = slice(component[0].min(),component[0].max()+1)
   x = slice(component[1].min(),component[1].max()+1)
   y = slice(component[2].min(),component[2].max()+1)

   np.save('path.../arr'+str(i), arr[z,x,y])

arr = np.random.randint(0,1500,(721,1285,1285))

for i in np.unique(arr):
   bound_box(i)

I tested the bound_box() function to save single arrays, which worked perfectly well. The arrays were about 500-700KB each. Using the random data the size is a bit bigger (up to 2GB), which is expected since there aren't any connected components anymore. But still my function seems to work as expected.

BUT if I try to loop a lot of memory is allocated to python and then when I eventually run out of memory the program crashes without writing any ".npy" files. So I'm quite sure the problem is the for loop, but I wasn't able to figure our a solution by myself. I'd really appreciate, if someone could help me out and tell me what I'm doing wrong here!

What I've already tried:

  • using del to delete the variables used in my function in the hope to clear some memory. This had no noticeable effect.
def bound_box(i):
   component = np.nonzero(arr == i)
   z = slice(component[0].min(),component[0].max()+1)
   x = slice(component[1].min(),component[1].max()+1)
   y = slice(component[2].min(),component[2].max()+1)

   np.save('path.../arr'+str(i), arr[z,x,y])

   del x
   del y
   del z
   del grain
Nimantha
  • 6,405
  • 6
  • 28
  • 69
tschermak
  • 45
  • 6
  • 1
    I wonder if the problem is in the `np.unique(arr)`. It might not even get to the `bound_box` call. I'd include a `print(i)` or something similar in the function to verify that it is indeed being called. – hpaulj Dec 27 '21 at 19:07
  • You were right! I added a solution using your input above. – tschermak Dec 27 '21 at 22:31

0 Answers0