0

I have some memory intensive image filters that I want to call block by block on large images/arrays (because they compute the filter for an entire array, they run out of memory when trying to compute the whole array).

def block_process(Ic, blocksize):
    B = numpy.empty(Ic.shape)

    colstart = 0
    while colstart < Ic.shape[1]:
        BlockWidth = blocksize
        if (colstart + blocksize) > Ic.shape[1]:
            BlockWidth = Ic.shape[1] - colstart
        rowstart = 0
        while rowstart < Ic.shape[0]:
            BlockHeight = blocksize
            if (rowstart + blocksize) > Ic.shape[0]:
                BlockHeight = Ic.shape[0] - rowstart

            B[colstart:colstart+BlockWidth, rowstart:rowstart+BlockHeight] = filter1(params) # One of many available filters

            rowstart += BlockHeight
        colstart += BlockWidth
    return B # The complete filtered array

My filters are computed in other functions i.e. def filter1(A, filtsize), def filter2(A, filtsize, otherparam), which have an A parameter (the input array, given by the block function), and other parameters such as filter size. Some filters have more parameters than others. They return the filtered array.

Two questions

  1. How do I go about calling one of my filter functions through the block_process function? I don't want to copy the block processing code into each function. In other words, is there a way of specifying which filter to call (and with what parameters) as a parameter of the block_process() call?
  2. Is there a better way of coding this?
Benjamin
  • 11,560
  • 13
  • 70
  • 119

1 Answers1

2

You can do it like this:

def block_process(a, blocksize, filt, args):
    b = numpy.empty(a.shape)
    for row in xrange(0, a.shape[0], blocksize):
        for col in xrange(0, a.shape[1], blocksize):
            b[row:row + blocksize, col:col + blocksize] = (
                filt(a[row:row + blocksize, col:col + blocksize], *args))
    return b

There is no need to correct for incomplete blocks at the right and lower edge of the image -- this will happen automatically. You can simply pass in the filter function and the tuple of arguments. To call the filter filter1(a, filtsize), use

block_process(a, blocksize, filter1, (filtsize,))

The code above assumes that the first parameter to a filter is the array to be filtered and that a filter returns a filtered array of the same shape.

It is also possible that your filters can be rewritten in a way that they don't use as much memory, so the blockprocessing would become unnecessary.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Thanks, this is great. My filters are running on large (5000x5000) arrays and require keeping 3-4 additional arrays of the same size in memory, which is why they are running out of memory but that is for another question... – Benjamin Feb 10 '11 at 15:25
  • @Benjamin: often the number of additional arrays needed can be reduced -- this may indeed be for another question :) – Sven Marnach Feb 10 '11 at 15:30
  • @Benjamin - On a side note, another advantage to doing things in chunks is that it makes easy to spread the calculation across multiple processors using `multiprocessing`. So, even if the memory problems are fixed later, it can be useful to chunk things like this. – Joe Kington Feb 10 '11 at 15:33
  • 1
    Simple threading often works as well, since most of the compuationally intensive NumPy stuff runs outside the GIL. – Sven Marnach Feb 10 '11 at 15:40
  • @Joe Kington: I posted my filter http://stackoverflow.com/questions/4959171/improving-memory-usage-in-an-array-wide-filter-to-avoid-block-processing if you are interested. – Benjamin Feb 10 '11 at 15:41
  • @SvenMarnach: Just came back to this answer... I think you reversed `row` and `col` in the line where you assign filtered values to `b`... – Benjamin Jul 20 '16 at 18:33
  • @Benjamin: You are right. I probably tested this with a square array, in which case the code actually works correctly. Looking at my answer again, I realize that it would probably be better to reshape the two-dimensional array to a four-dimensional block array (a two-dimensional array of two-dimensional blocks), since this would make indexing less error-prone. – Sven Marnach Jul 21 '16 at 11:19