0

I'm trying to optimize a method in numpy, evaluating a gaussian function at many positions in an image. Right now I'm calculating boolean masks around the center positions of every gaussian and using them for indexing the data array, in order to save computation time. For a small amount of data points, boolean indexing greatly accellerated the script in comparison to calculating the indices via np.argwhere.

Unfortunately, it seems the boolean indexing seems to cause a memory leak. The following example causes my RAM to max out. How do I prevent this? Is this a numpy bug, or am I doing something wrong? I think this might be because the boolean indexing creates copies of the data, as someone wrote in this answer. Shouldn't the memory be freed after the copy is not needed anymore? Is there a way to manually free the memory?

import numpy as np
import matplotlib.pyplot as plt


class Image():
    def __init__(self, nx, ny, px):
        x = np.linspace(0, nx * px, nx)
        y = np.linspace(0, ny * px, ny)
        self.x, self.y = np.meshgrid(x, y)
        self.data = np.zeros((nx, ny), dtype=np.uint8)
        self.nx = nx  # number of pixels
        self.ny = ny
        self.px = px  # pixel size [mm]

    def gaussian(self, x, y, x0, y0, amplitude=None, sigma=0.01):
        """ 2d gaussian function

        Parameters:
            x, y: coordinates of evaluation position
            x0, y0: center coordinates of Gaussian
            amplitude: amplitude of the Gaussian
            sigma: width of Gaussian
        """
        if amplitude is None:
            amplitude = np.iinfo(self.data.dtype).max
        result = amplitude * np.exp(
            -((x - x0)**2 + (y - y0)**2) / (2 * sigma ** 2))
        return result.astype(self.data.dtype)

    def mask_from_pos(self, x0, y0, radius_px):
        """ returns a square mask around the position (x0, y0)
        """
        masks = np.zeros((x0.size, self.ny, self.nx), dtype=bool)
        i_center_x = (x0 // self.px).astype(int)
        i_center_y = (y0 // self.px).astype(int)
        for ix, iy, mask in zip(i_center_x, i_center_y, masks):
            y_start = max(0, min(iy - radius_px, self.ny - 1))
            y_end = max(0, min(iy + radius_px + 1, self.ny))
            x_start = max(0, min(ix - radius_px, self.nx - 1))
            x_end = max(0, min((ix + radius_px + 1), self.nx))

            mask[y_start: y_end, x_start: x_end] = True
        return masks

    def add_gaussians(self, posx, posy, radius_px=9):
        masks = self.mask_from_pos(posx, posy, radius_px)
        for x0, y0, mask in zip(posx, posy, masks):
            self.data[mask] += self.gaussian(
                self.x[mask], self.y[mask], x0, y0).astype(self.data.dtype)


if __name__ == '__main__':
    image = Image(2000, 2000, 0.005)
    xy_max = 2000 * 0.005
    x0 = np.random.rand(3000) * xy_max
    y0 = np.random.rand(3000) * xy_max
    image.add_gaussians(x0, y0)

    plt.figure(dpi=300, figsize=(8, 8))
    plt.imshow(image.data,
            cmap=plt.get_cmap('gray'),
            extent=[0, xy_max, 0, xy_max])

I already tried debugging this and none of the python objects seems to grow, so I assume this is a problem in numpy, but I'm not 100% certain. Any help with this is greatly appreciated! Thanks in advance.

Edit 1: Workaround

I found a solution that gives the same result but without flooding the memory. Simply replacing the calculation of a mask-array by a mask-generator did the trick:

    def mask_from_pos_gen(self, x0, y0, radius_px):
        """ yields a square mask around the position (x0, y0)
        """
        i_center_x = (x0 // self.px).astype(int)
        i_center_y = (y0 // self.px).astype(int)
        for ix, iy in zip(i_center_x, i_center_y):
            mask = np.zeros(self.data.shape, dtype=bool)
            y_start = max(0, min(iy - radius_px, self.ny - 1))
            y_end = max(0, min(iy + radius_px + 1, self.ny))
            x_start = max(0, min(ix - radius_px, self.nx - 1))
            x_end = max(0, min((ix + radius_px + 1), self.nx))

            mask[y_start: y_end, x_start: x_end] = True
            yield mask

This method can then be used exactly the same way as mask_from_pos before, but the memory is freed this way. Leaving this here in case someone else has the same problem and hoping some numpy Guru will explain this behavior.

Richard
  • 709
  • 1
  • 6
  • 15

1 Answers1

0

It turns out boolan indexing is not the best solution for this problem. Using a numpy slice gives a massive speed boost in comparison to the boolean masks.

    def slice_from_pos_gen(self, x0, y0, radius_px):
        """ yields a square mask around the position (x0, y0)
        """
        i_center_x = (x0 // self.px).astype(int)
        i_center_y = (y0 // self.px).astype(int)
        for ix, iy in zip(i_center_x, i_center_y):
            y_start = max(0, min(iy - radius_px, self.ny - 1))
            y_end = max(0, min(iy + radius_px + 1, self.ny))
            x_start = max(0, min(ix - radius_px, self.nx - 1))
            x_end = max(0, min((ix + radius_px + 1), self.nx))

            yield np.s_[y_start: y_end, x_start: x_end]

This method can then be used exactly the same way as mask_from_pos before, but is much faster and does not fill up the memory.

Richard
  • 709
  • 1
  • 6
  • 15