How can I make this image processing function faster? Already tried Cython

Question

I'm currently trying to implement a function in Python that is supposed to find occurrences of a certain color value in an image in order to determine the bounding box of a color region. This seems to work, albeit at a very slow speed. Iterating over a single 1920x1080 image takes around 30 seconds. I tried converting this into Cython code, which improved the performance only by ~2 seconds per image. This is still shy of what I'm looking for. Since I'm a newbie to Cython I was hoping you could give me some hints on improving this. You can see my code below, thanks a lot!

cimport cython

import numpy as np
cimport numpy as np

@cython.wraparound(False)
@cython.boundscheck(False)
cdef _cget_bboxes_(img):

    cdef int y_lim = img.shape[0]
    cdef int x_lim = img.shape[1]

    cdef np.ndarray img_array = img

    color_dict = {}


    cdef int y, x

    for y in range(y_lim):
        for x in range(x_lim):

            pix = img_array[y][x]
            pix = tuple(pix)

            if np.any(pix >= (10, 10, 10)):
                if pix not in color_dict:

                    color_dict[pix] = {"min_x": x, "max_x": x, "min_y": y, "max_y": y, "count": 1}

                else:

                    if color_dict[pix]["min_x"] >= x:
                        color_dict[pix]["min_x"] = x

                    if color_dict[pix]["max_x"] <= x:
                        color_dict[pix]["max_x"] = x

                    if color_dict[pix]["min_y"] >= y:
                        color_dict[pix]["min_y"] = y

                    if color_dict[pix]["max_y"] <= y:
                        color_dict[pix]["max_y"] = y

                color_dict[pix]["count"] += 1

    return color_dict

score 4 · Answer 1 · answered Feb 10 '18 at 15:21

4

It's a really a bad idea to use a dictionary to look up color triplets. You have a fixed range for the triplet values (I assume 0..255). Replacing your dictionary with a 3D array of sizes 256x256x256 would speed up your code tremendously (lookup will be trivial)

Note that what you are doing is computing a color histogram. I'd be surprised if this didn't already exist somewhere and available in Python.

Also, color histograms are often computed on more coarsely quantized color values, for example using 64 bins in each dimension. That will reduce memory usage and increase speed, and is unlikely to matter in most applications.

answered Feb 10 '18 at 15:21

Cris Luengo

55,762
10
62
120

Yes, I was suspecting the dictionaries to be the cause for this, but they were just so so convenient to work with that I barely had the heart to part ways with them. Anyway thanks to your hints I managed to make the method run in only 1,3 seconds. A great success! Thank you! – dawg_91 Feb 10 '18 at 17:34
1

1.3 s in Cython??? It doesn't seem to do a lot of compiling then. Compiled code would do this in a small fraction of a second! – Cris Luengo Feb 10 '18 at 17:41

CodeSurgeon · Answer 2 · 2018-02-12T23:47:29.320

I see that you were able to get your code to run in about 1 second and are happy with the performance. However, you can make your code even faster with the power of numpy structured arrays!

Taking the advice of @chrisb and @CrisLuengo, you not only want to add type information to your variables, but you also want to choose the appropriate data structures. I would suggest you take a look at this blog post but in short, Python containers like dict do not store data contiguously in memory but instead require "un-boxing" pointers to python objects whenever you access a particular element. This is slow and hurts CPU cache performance.

Here is what my version of your _cget_bboxes_ function looks like:

cimport cython
from libc.stdint cimport uint8_t
import numpy as np
cimport numpy as np

cdef packed struct ColorData:
    np.uint16_t min_x, max_x, min_y, max_y
    np.uint32_t count

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef get_histogram(np.uint8_t[:, :, :] img):
    cdef int y_lim = img.shape[0]
    cdef int x_lim = img.shape[1]
    cdef int y, x
    cdef uint8_t r, g, b

    """
    #You can define a numpy structured array dtype by hand using tuples...
    cdef np.dtype color_dtype = np.dtype([
        ("min_x", np.uint16),
        ("max_x", np.uint16),
        ("min_y", np.uint16),
        ("max_y", np.uint16),
        ("count", np.uint32)])
    """

    """
    Or, instead of rewriting the struct's definition as a numpy dtype, you can use this generic approach:
    1- making a temp object
    2- getting its pointer
    3- converting to memoryview
    4- converting to numpy array
    5- then getting that numpy array's dtype
    """
    cdef ColorData _color
    cdef np.dtype color_dtype = np.asarray(<ColorData[:1]>(&_color)).dtype


    #cdef ColorData[:, :, :] out#this alternatively works
    cdef np.ndarray[ColorData, ndim=3] out
    out = np.zeros(shape=(256, 256, 256), dtype=color_dtype)

    for y in range(y_lim):
        for x in range(x_lim):
            r = img[y, x, 0]
            g = img[y, x, 1]
            b = img[y, x, 2]
            if r >= 10 or g >= 10 or b >= 10:
                if out[r, g, b].count == 0:
                    out[r, g, b] = [x, x, y, y, 1]
                    """
                    out[r, g, b].min_x = x
                    out[r, g, b].max_x = x
                    out[r, g, b].min_y = y
                    out[r, g, b].max_y = y
                    out[r, g, b].count = 1
                    """
                else:
                    if out[r, g, b].min_x >= x:
                        out[r, g, b].min_x = x
                    if out[r, g, b].max_x <= x:
                        out[r, g, b].max_x = x
                    if out[r, g, b].min_y >= y:
                        out[r, g, b].min_y = y
                    if out[r, g, b].max_y <= y:
                        out[r, g, b].max_y = y
                    out[r, g, b].count += 1
    return out

To "type" a numpy structured array, I have to include a struct definition that corresponds to the array's dtype. I also take care in my loop to avoid generating tuples to index into the out array. For comparison, this code runs in about 0.02 seconds for a 1920x1080 image on my laptop. Hope this helps demonstrate how you can take full advantage of Cython's compiled nature!

I like this solution a lot. Yet there are two minor bugs: min_x and min_y will always be 0. — ead, Feb 11 '18 at 10:46
@ead You are absolutely right! Added the appropriate if statement to catch the initial case when `count=0`. — CodeSurgeon, Feb 11 '18 at 14:39
Great solution, the structured arrays are a great feature and exactly what I was looking for. One more little bug though: min_x, max_x, min_y, max_y should be declared at least as uint16 otherwise there will be overflow. Thank you very much! — dawg_91, Feb 12 '18 at 19:41
@dawg_91 Corrected this as well now. `uint16` should do the trick since I can't imagine typical images being much larger than that. Thanks for catching that! — CodeSurgeon, Feb 12 '18 at 23:49

score 2 · Answer 3 · answered Feb 10 '18 at 15:07

Running cython with the --annotate highlights sections that interact heavily with python, which will give you good direction on what to change. Several things immediately jump out:

1) just cleanup but img should be typed directly in the function sig, the assignment to img_array is unnecessary

2) np.ndarray isn't a specific enough type, you also need the underlying dtype. I like the memoryview syntax, so your function sig could be

def _cget_boxes(np.uint8_t[:, :, :] img)

3) Anything that can by typed, should

4) tuples and dicts are slow compared to arrays and c-typed scalars. It may (or may not!) be better to try to refactor color_dict into a set of arrays.

Thank you for your great input! I managed to make the method run in around 1.3 seconds, almost a x30 improvement. Great! — dawg_91, Feb 10 '18 at 17:35

How can I make this image processing function faster? Already tried Cython

3 Answers3