High performance variable blurring in very big images using Python

Question

I have a large collection of large images (ex. 15000x15000 pixels) that I would like to blur. I need to blur the images using a distance function, so the further away I move from some areas in the image the more heavier the blurring should be. I have a distance map describing how far a given pixel is from the areas.

Due to the large amount of images I have to consider performance. I have looked at NumPY/SciPY, they have some great functions but they seem to use a fixed kernel size and I need to reduce or increase the kernel size depending on the distance to the previous mentioned areas.

How can I solve this problem in python?

UPDATE: My solution so far based on the answer by rth:

# cython: boundscheck=False
# cython: cdivision=True
# cython: wraparound=False

import numpy as np
cimport numpy as np

def variable_average(int [:, ::1] data, int[:,::1] kernel_size):
    cdef int width, height, i, j, ii, jj
    width = data.shape[1]
    height = data.shape[0]
    cdef double [:, ::1] data_blurred = np.empty([width, height])
    cdef double res
    cdef int sigma, weight

    for i in range(width):
        for j in range(height):
            weight = 0
            res = 0
            sigma =  kernel_size[i, j]
            for ii in range(i - sigma, i + sigma + 1):
                for jj in range(j - sigma, j + sigma + 1):
                    if ii < 0 or ii >= width or jj < 0 or jj >= height:
                        continue
                    res += data[ii, jj]
                    weight += 1
            data_blurred[i, j] = res/weight

    return data_blurred

Test:

data = np.random.randint(256, size=(1024,1024))
kernel = np.random.randint(256, size=(1024,1024)) + 1
result = np.asarray(variable_average(data, kernel))

The method using the above settings takes around 186seconds to run. Is that what I can expect to ultimately squeeze out of the method or are there optimizations that I can use to further increase the performance (still using Python)?

This does not answer your question but if you really are concerned about performance you should try C++ and OpenCV. — coincoin, Jun 24 '15 at 08:37
What kind of blur do you want? Gaussian? Box blur? Does the blur have to be mathematically accurate or is it for artistic effect and "if it looks good enough, it is good enough"? Also, is the blur amount related to a depth value in a complex 3D scene? Do you have to worry about not bleeding color values from near in-focus objects onto far out-of-focus surfaces etc? — samgak, Jun 24 '15 at 08:40
@coincoin: I would like to stay out of C++ due to my lack of C++ knowledge, but I have thought exactly the same :) @samgak: I need the image to have seamless blurring and it should just look good enough, it is for visual purposes. An average of the nearest `X` pixels around the selected pixel is all I need, no fancy stuff required. The images are 2D photographs. — Chau, Jun 24 '15 at 09:22
As a side note, you can `return data_blurred.base` as to avoids the need to call `np.asarray` . Also using `cpdef` instead of `def` is probably preferable (no impact on performance). Finally, as for any high performance C code, check your compilation flags. For instance on my system (Linux Gentoo) I was surprised to find that setting `extra_compile_args=['-O3', '-march=native']`, `extra_link_args=['-O3', '-march=native']` in `setup.py` resulted in a speed up by a factor of ~2 for some code , while on the same machine on Windows using Ananconda, the speed up was negligible. — rth, Jul 16 '15 at 09:01

rth · Accepted Answer · 2015-06-25T08:24:22.377

As you have noted related scipy functions do not support variable size blurring. You could implement this in pure python with for loops, then use Cython, Numba or PyPy to get a C-like performance.

Here is a low level python implementation, than uses numpy only for data storage,

import numpy as np

def variable_blur(data, kernel_size):
    """ Blur with a variable window size
    Parameters:
      - data: 2D ndarray of floats or integers
      - kernel_size: 2D ndarray of integers, same shape as data
    Returns:
      2D ndarray
    """
    data_blurred = np.empty(data.shape)
    Ni, Nj = data.shape
    for i in range(Ni):
        for j in range(Nj):
            res = 0.0
            weight = 0
            sigma =  kernel_size[i, j]
            for ii in range(i - sigma, i+sigma+1):
                for jj in range(j - sigma, j+sigma+1):
                    if ii<0 or ii>=Ni or jj < 0 or jj >= Nj:
                        continue
                    res += data[ii, jj]
                    weight += 1
            data_blurred[i, j] = res/weight
    return data_blurred

data = np.random.rand(50, 20)
kernel_size = 3*np.ones((50, 20), dtype=np.int)
variable_blur(data, kernel_size)

that calculates an arithmetic average of pixels with a variable kernel size. It is a bad implementation with respect to numpy, in a sense that is it not vectorized. However, this makes it convenient to port to other high performance solutions:

Cython: simply statically typing variables, and compiling should give you C-like performance,

def variable_blur(double [:, ::1] data, long [:, ::1] kernel_size):
     cdef double [:, ::1] data_blurred = np.empty(data.shape)
     cdef Py_ssize_t Ni, Nj
     Ni = data.shape[0]
     Nj = data.shape[1]
     for i in range(Ni):
         # [...] etc.

see this post for a complete example, as well as the compilation notes.

Numba: Wrapping the above function with the @jit decorator, should be mostly sufficient.
PyPy: installing PyPy + the experimental numpy branch, could be another alternative worth trying. Although, then you would have to use PyPy for all your code, which might not be possible at present.

Once you have a fast implementation, you can then use multiprocessing, etc. to process different images in parallel, if need be. Or even parallelize with OpenMP in Cython the outer for loop.

Thank you, its very elaborate. I will try getting Cython into my repertoire and see how this goes. — Chau, Jun 24 '15 at 11:35
I have finally found time to investigate Cython and your answer. I have updated my question with what I have done. If you have any additional ideas I would like to hear them :) — Chau, Jul 13 '15 at 12:36
@Chau I can't see obvious optimisations of this. You can try to thread the outer loop replacing `range` by a `prange` (see [Cython documentation](http://docs.cython.org/src/userguide/parallelism.html)), but overall I think that's pretty much as fast as you will get: You do process rather large arrays. — rth, Jul 15 '15 at 17:37
I do work with huge amounts of data hence my need for fast computations. I really appreciate your help, I'll definitely consider using Cython more in the future! — Chau, Jul 16 '15 at 06:47

score 1 · Answer 2 · answered Apr 30 '21 at 18:02

I came across this while googling and thought I would share my own solution which is mostly vectorized and doesn't include any for loops on pixels. You can approximate a Gaussian blur by running a box blur multiple times in a row. So the approach I decided to use is to iteratively box blur the image, but to vary the number of iterations per pixel using a weighting function.

If you need a large blur radius, the number of iterations grows quadratically, so consider increasing the ksize.

Here is the implementation

import cv2

def variable_blur(im, sigma, ksize=3):
    """Blur an image with a variable Gaussian kernel.
    
    Parameters
    ----------
    im: numpy array, (h, w)
    
    sigma: numpy array, (h, w)
    
    ksize: int
        The box blur kernel size. Should be an odd number >= 3.
        
    Returns
    -------
    im_blurred: numpy array, (h, w)
    
    """
    variance = box_blur_variance(ksize)
    # Number of times to blur per-pixel
    num_box_blurs = 2 * sigma**2 / variance
    # Number of rounds of blurring
    max_blurs = int(np.ceil(np.max(num_box_blurs))) * 3
    # Approximate blurring a variable number of times
    blur_weight = num_box_blurs / max_blurs

    current_im = im
    for i in range(max_blurs):
        next_im = cv2.blur(current_im, (ksize, ksize))
        current_im = next_im * blur_weight + current_im * (1 - blur_weight)
    return current_im

def box_blur_variance(ksize):
    x = np.arange(ksize) - ksize // 2
    x, y = np.meshgrid(x, x)
    return np.mean(x**2 + y**2)

And here is an example

im = np.random.rand(300, 300)
sigma = 3

# Variable
x = np.linspace(0, 1, im.shape[1])
y = np.linspace(0, 1, im.shape[0])
x, y = np.meshgrid(x, y)
sigma_arr = sigma * (x + y)
im_variable = variable_blur(im, sigma_arr)

# Gaussian
ksize = sigma * 8 + 1
im_gauss = cv2.GaussianBlur(im, (ksize, ksize), sigma)

# Gaussian replica
sigma_arr = np.full_like(im, sigma)
im_approx = variable_blur(im, sigma_arr)

Blurring results

The plot is:

Top left: Source image
Top right: Variable blurring
Bottom left: Gaussian blurring
Bottom right: Approximated Gaussian blurring

thanks for the solution. This is a pretty old question and if I get back to a similar problem, I will definitely try your solution. — Chau, May 03 '21 at 10:10

High performance variable blurring in very big images using Python

2 Answers2

Linked