Numpy array: how to undo zero-padding without copying array?

Question

I am working on 3D image segmentation with deep learning. Basically, I need to 1/ pad a numpy array, 2/ process the array, 3/ unpad the array.

dataArray = np.pad(dataArray, 25, mode='constant', constant_values=0) # pad
processedArray = my_process(dataArray) # process
processedArray = processedArray[25:-25, 25:-25, 25:-25, :] # unpad

Problem is, processedArray is very large (464,928,928,928,10) and I run into out of memory when executing the unpadding. I imagine that the unpadding allocates new memory? Am I right? How could I proceed so that no new memory is allocated? In other words, so that index points to unpadded elements, without copying the elements?

Information that might help: above lines are executed in a function, and processedArray is returned

if it's a function then it's already a copy. – Zabir Al Nazi Apr 03 '20 at 16:46 — Zabir Al Nazi, Apr 03 '20 at 16:46

score 0 · Answer 1 · answered Apr 03 '20 at 16:52

Maybe you're running out of memory because once you call the function with this array, a copy is being created inside the function which doubles your memory. So, just don't create extra arrays.

You can keep a global copy of the array. Just apply the operations on the global array without creating extra copy.

import gc

global processedArray # before all your assignment starts

inside my_process()

def my_process():
   global processedArray
   # do all operations on processedArray

global processedArray
processedArray = np.pad(dataArray, 25, mode='constant', constant_values=0) # pad
my_process()

del dataArray() # delete not needed arrays to make more space
gc.collect()

global processedArray
processedArray = processedArray[25:-25, 25:-25, 25:-25, :] # unpad

But, still you'll run out of memories if your my_process calls more library functions which makes copies of the processedArray. Try to apply every operations on a global array without making any copy.

Hamza Khurshid · Answer 2 · 2020-04-03T17:07:35.830

A possible solution to the memory problem is using short instead of float as your numpy data type. You can try this.

dataArray = np.pad(dataArray, 25, mode='constant', constant_values=0) # pad
processedArray = my_process(dataArray).astype(np.short) # process
processedArray = processedArray[25:-25, 25:-25, 25:-25, :] # unpad
processedArray = processedArray.astype(np.float32) #Converting to float type again

Alternatively, you can delete your dataArray to create space for processedArray.

dataArray = np.pad(dataArray, 25, mode='constant', constant_values=0) # pad
del dataArray #deleting dataArray to claim memory
processedArray = my_process(dataArray) # process
processedArray = processedArray[25:-25, 25:-25, 25:-25, :] # unpad

Numpy array: how to undo zero-padding without copying array?

2 Answers2