Speeding-up "for-loop" in image analysis when iterations are up to 40,000

Question

The details of the prerequisites of this code are quite long so I'll try my best to summarize. WB/RG/BYColor is the base image, FIDO is an overlay of this base image which is applied to it. S_wb/rg/by are the final output images. WB/RG/BYColor are the same size as FIDO.

For each unique element in FIDO, we want to calculate the average color of that region within the base images. The below code does this, but as numFIDOs is very large (up to 40,000), this takes a long time.

The averages are computed for the three separate RGB channels.

sX=200
sY=200
S_wb = np.zeros((sX, sY))
S_rg = np.zeros((sX, sY))
S_by = np.zeros((sX, sY))
uniqueFIDOs, unique_counts = np.unique(FIDO, return_counts=True) 
numFIDOs = uniqueFIDOs.shape  
for i in np.arange(0,numFIDOs[0]):
    Lookup = FIDO==uniqueFIDOs[i]
    # Get average of color signals for this FIDO
    S_wb[Lookup] = np.sum(WBColor[Lookup])/unique_counts[i]
    S_rg[Lookup] = np.sum(RGColor[Lookup])/unique_counts[i]
    S_by[Lookup] = np.sum(BYColor[Lookup])/unique_counts[i]

This takes about 7.89 seconds to run, no so long, but this will be included in another loop, so it builds up!

I have tried vectorization (shown below) but I couldn't do it

FIDOsize = unique_counts[0:numFIDOs[0]:1]
Lookup = FIDO ==uniqueFIDOs[0:numFIDOs[0]:1]
S_wb[Lookup] = np.sum(WBColor[Lookup])/FIDOsize
S_rg[Lookup] = np.sum(RGColor[Lookup])/FIDOsize
S_by[Lookup] = np.sum(BYColor[Lookup])/FIDOsize

error in array size matching

You may want to make your code self-contained and add runtime measurements. — cel, Nov 09 '15 at 15:45
is it possible to attach variables? To make it self-contained I would have to copy many many lines of code into here. — William Baker Morrison, Nov 09 '15 at 16:26
One thing that is apparent is that you could turn 7 slow lookups into one: `indexarr = FIDO==uniqueFIDOs[i]`. Then just do `S[indexarr] = np.sum(Color[indexarr])/FIDOsize`. You could also probably simplify the FIDOsize computation by using `uniqueFIDOs, unique_counts = np.unique(FIDO, return_counts=True)` and then just `FIDOsize = unique_counts[i]`. — wflynny, Nov 09 '15 at 16:36
When it comes to (time-critical) image analysis, something like [pyCUDA](http://mathema.tician.de/software/pycuda/) could be worth a look - at least, for some application cases. — Marco13, Nov 11 '15 at 10:03

chthonicdaemon · Answer 1 · 2015-11-10T08:54:12.157

5

By my timing, this is about 10 times faster than your original method. I tested with these arrays:

import numpy as np

sX=200
sY=200

FIDO = np.random.randint(0, sX*sY, (sX, sY))
WBColor = np.random.randint(0, sX*sY, (sX, sY))
RGColor = np.random.randint(0, sX*sY, (sX, sY))
BYColor = np.random.randint(0, sX*sY, (sX, sY))

This is the part I timed:

import collections

colors = {'wb': WBColor, 'rg': RGColor, 'by': BYColor}
planes = colors.keys()
S = {plane: np.zeros((sX, sY)) for plane in planes}

for plane in planes:
    counts = collections.defaultdict(int)
    sums = collections.defaultdict(int)
    for (i, j), f in np.ndenumerate(FIDO):
        counts[f] += 1
        sums[f] += colors[plane][i, j]
    for (i, j), f in np.ndenumerate(FIDO):
        S[plane][i, j] = sums[f]/counts[f]

Probably because even though looping in Python is slow, this traverses the data less.

Note, the original version gets faster if there are a small number of unique values in FIDO. This takes roughly the same time for most cases.

edited Nov 10 '15 at 08:54

answered Nov 09 '15 at 18:59

chthonicdaemon

19,180
2
52
66

instead of `collections.defaultdict(int)` you could use `collections.Counter`. – Bakuriu Nov 10 '15 at 08:41
[defaultdict is faster](http://stackoverflow.com/questions/27801945/surprising-results-with-python-timeit-counter-vs-defaultdict-vs-dict). Also, for this application, most of the time is actually spent traversing the data, so I wanted to be able to update the sum rather than just count. – chthonicdaemon Nov 10 '15 at 08:52
In python3 `Counter` is like twice as fast as `defaultdict(int)`. – Bakuriu Nov 10 '15 at 09:08
Python 3.5 on my MacBook Air: `Counter`: 530 ms, `defaultdict(int)`: 420 ms. – chthonicdaemon Nov 10 '15 at 09:27

B. M. · Answer 2 · 2015-11-11T07:53:17.757

4

Your code is not optimal, because you scan the all image for each region in FIDO. Better approach is to group the pixels of each region and calculate the means first. pandasgive nice tools for such computations (just on one canal here). Then you span the means on the regions :

import numpy as np
import pandas as pd     
sX=200
sY=200
Nreg=sX*sY
WBColor=np.random.randint(0,256,(sX,sY))
FIDO=np.random.randint(0,Nreg,(sX,sY))


def oldloop():
    S_wb = np.zeros((sX, sY))
    uniqueFIDOs, unique_counts = np.unique(FIDO, return_counts=True) 
    numFIDOs = uniqueFIDOs.shape 
    for i in np.arange(0,numFIDOs[0]):
        Lookup = FIDO==uniqueFIDOs[i]
        S_wb[Lookup] = np.sum(WBColor[Lookup])/unique_counts[i]
    return S_wb

def newloop():
    index=pd.Index(FIDO.flatten(),name='region')
    means= pd.DataFrame(WBColor.flatten(),index).groupby(level='region').mean()
    lookup=np.zeros(Nreg)
    lookup[means.index]=means.values
    return lookup[FIDO]

in this case, this is about 200 times faster :

In [32]: np.allclose(oldloop(),newloop())
Out[32]: True

In [33]: %timeit -n1 oldloop()
1 loops, best of 3: 3.92 s per loop

In [34]: %timeit -n100 newloop()
100 loops, best of 3: 20.5 ms per loop

EDIT

An other cool modern approach is to use numba. You write (very) basic python code running at nearly C speed :

from numba import jit

@jit
def numbaloops():
    counts=np.zeros(Nreg)
    sums=np.zeros(Nreg)
    S = np.empty((sX, sY))
    for x in range(sX):
        for y in range(sY):
            region=FIDO[x,y]
            value=WBColor[x,y]
            counts[region]+=1
            sums[region]+=value
    for x in range(sX):
        for y in range(sY):
            region=FIDO[x,y]
            S[x,y]=sums[region]/counts[region]
    return S

You are now about 4000 times faster :

In [45]: np.allclose(oldloop(),numbaloops())
Out[45]: True

In [46]: %timeit -n1000 numbaloops()
1000 loops, best of 3: 1.06 ms per loop

edited Nov 11 '15 at 07:53

answered Nov 10 '15 at 07:55

B. M.

18,243
2
35
54

You can merge the last two sets of loops in the numba solution – chthonicdaemon Nov 11 '15 at 02:22
@chthonicdaemon : Thanks. I take it into account, there are only two loops now, for an another 2 factor improvement. – B. M. Nov 11 '15 at 07:49
@B. M. I receive this error with Pandas method: `>>> means=pd.DataFrame(WBColor.flatten(),index).groupby(level='region').mean() .....code cut out..... File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 1055, in _get_grouper raise ValueError('level > 0 only valid with MultiIndex') ValueError: level > 0 only valid with MultiIndex` – William Baker Morrison Nov 11 '15 at 10:30
It seems to be a version bug (https://github.com/pydata/pandas/issues/4014). `level=0` instead is probably a good patch here. – B. M. Nov 11 '15 at 10:46
@B. M. How can I convert the pandas object `means.index` into an array of integers? `lookup[means.index]=means.values IndexError: arrays used as indices must be of integer (or boolean) type` – William Baker Morrison Nov 12 '15 at 08:54
the best is to update your installation : pip install pandas --upgrade. pdmeans.index.values return this array in pandas 0.15.2 . – B. M. Nov 12 '15 at 19:13
@B. M. now working on pandas 0.17.0, and I still receive this error – William Baker Morrison Nov 17 '15 at 10:08
have you tried `lookup[pdmeans.index.values]=pdmeans.values` ? – B. M. Nov 17 '15 at 15:27

Imanol Luengo · Answer 3 · 2015-11-10T16:31:46.433

As @lejlot suggested before, the code is quite hard to vectorize. It cannot be run in parallel, unless you know which pixels belong to each FIDO in advance. I don't know if you call FIDO to superpixels, but I do usually work with this kind of problems, and the best solution I've found so far is as follows:

Flatten the data:
```
data = data.reshape(-1, 3)
labels = FIDO.copy()
```
Here data is your (Width, Height, 3) image, rather than the separate 3 vectors that you have. It gets flattened to (Width * Height, 3).
Relabel FIDO to 0..N-1 range, where N=num unique FIDO:
```
from skimage.segmentation import relabel_sequential

labels = relabel_sequential(labels)[0]
labels -= labels.min()
```
The above, from scikit-image, transforms your FIDO array to the [0, N-1] range, which is much easier to work with later.

Last, code in cython a simple function to calculate the mean for each of the FIDO;s (as they are ordered from 0 to N, you can do it in a 1D array with length N):

def fmeans(double[:, ::1] data, long[::1] labels, long nsp):
    cdef long n,  N = labels.shape[0]
    cdef int K = data.shape[1]
    cdef double[:, ::1] F = np.zeros((nsp, K), np.float64)
    cdef int[::1] sizes = np.zeros(nsp, np.int32)
    cdef long l, b
    cdef double t

    for n in range(N):
        l = labels[n]
        sizes[l] += 1

        for z in range(K):
            t = data[n, z]
            F[l, z] += t

    for n in range(nsp):
        for z in range(K):
            F[n, z] /= sizes[n]

return np.asarray(F)

You can call that function later (once compiled with cython), as simple as:

mean_colors = fmeans(data, labels.flatten(), labels.max()+1) # labels.max()+1 == N

The image of mean colors can then be recovered as:

mean_img = mean_colors[labels]

If you do not want to code in cython, scikit-image also provides bindings for this by using a graph structure and networkx, however is much slower:

http://scikit-image.org/docs/dev/auto_examples/plot_rag_mean_color.html

The above example contains the function calls you need to get an image with the average color of each superpixel as labels1 (your FIDO).

NOTE: The cython approach is much faster, as instead of iterating the number of unique FIDO N and for each of them scan the image (size M = Width x Height) this only iterates the image ONCE. Thus, computational cost is in the order of O(M+N) rather than O(M*N) of your original approach.

Example test:

import numpy as np
from skimage.segmentation import relabel_sequential

sX=200
sY=200

FIDO = np.random.randint(0, sX*sY, (sX, sY))
data = np.random.rand(sX, sY, 3) # Your image

Flatten and relabel:

data = data.reshape(-1, 3)
labels = relabel_sequential(FIDO)[0]
labels -= labels.min()

Obtain mean:

>>> %timeit color_means = fmeans(data, labels.flatten(), labels.max()+1)
1000 loops, best of 3: 520 µs per loop

It takes 0.5ms (half a millisecond) for a 200x200 image for:

print labels.max()+1 # --> 25787 unique FIDO
print color_means.shape # --> (25287, 3), the mean color of each FIDO

You can restore the image of mean colors using smart indexing:

mean_image = color_means[labels]
print mean_image.shape # --> (200, 200, 3)

I doubt you can get that speed with raw python approaches (or at least, I didn't find how).

Skimage 0.10 with "relabel_sequential" doesn't seem to be available for my system (Linux7.0), unfortunately — William Baker Morrison, Nov 11 '15 at 09:58
`relabel_sequential` should be available in [scikit-image 0.10.x](http://scikit-image.org/docs/0.10.x/api/api.html). To install it you can type `pip install scikit-image` in a linux terminal, or `conda install scikit-image` if you use anaconda. However, if you still need a pure python approach, @B.M.'s `pandas` option seems to be an appropiate choice. — Imanol Luengo, Nov 11 '15 at 10:40
I receive this error with pip installation `Traceback (most recent call last): File "/usr/local/bin/pip", line 5, in from pkg_resources import load_entry_point File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2711, in parse_requirements(__requires__), Environment() File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 584, in resolve raise DistributionNotFound(req) pkg_resources.DistributionNotFound: pip==1.5.5` — William Baker Morrison, Nov 11 '15 at 10:44
@WBM Write `easy_install --upgrade pip` to upgrade `pip` (you might need to put a `sudo` before). Afterwards, try the command above to install `scikit-image` again. — Imanol Luengo, Nov 11 '15 at 10:48
@WBM Ups, that might be a type from copy paste. Try `sudo pip install scikit-image` (with or without sudo, depends on your configuration). — Imanol Luengo, Nov 11 '15 at 10:57
@imalunego `Requirement already satisfied (use --upgrade to upgrade): scikit-image in /usr/lib/pymodules/python2.7 /usr/local/lib/python2.7/dist-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning` — William Baker Morrison, Nov 11 '15 at 11:03
Try to upgrade it as the command suggests `pip install --upgrade scikit-image`. — Imanol Luengo, Nov 11 '15 at 11:31
no luck: `--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from skimage.segmentation import relabel_sequential ImportError: cannot import name relabel_sequential` — William Baker Morrison, Nov 12 '15 at 08:00

score 3 · Answer 4 · answered Nov 09 '15 at 17:02

3

In short: loops in python are slow. You should do one of the following:

vectorize (you tried it, but you claim "it does not work"), what do you mean but not work? Vectorization (if possible) always works
switch to Cython, and declare iterator value to be int

Both above approaches base on the conversion of the bottle-neck loop into C-loop.

answered Nov 09 '15 at 17:02

lejlot

64,777
8
131
164

the function looks quite hard to vectorize, so you are probably left with the second option. – lejlot Nov 09 '15 at 17:18

score 3 · Accepted Answer · answered Nov 12 '15 at 16:12

3

This is already implemented in Scipy, so you could do:

from scipy.ndimage.measurements import mean as labeled_mean

labels = np.arange(FIDO.max()+1, dtype=int)
S_wb = labeled_mean(WBColor, FIDO, labels)[FIDO]
S_rg = labeled_mean(RGColor, FIDO, labels)[FIDO]
S_by = labeled_mean(BYColor, FIDO, labels)[FIDO]

This is assuming that FIDO contains relatively small integers. If this is not the case you can transform it through np.unique(FIDO, return_inverse=True).

This simple code is about 1000x faster than the original, for 200x200 images and FIDO containing random integers from zero to 40,000.

answered Nov 12 '15 at 16:12

Any thoughts on how I can avoid this error? `ValueError: operands could not be broadcast together with shapes (38890) (40000) ` – William Baker Morrison Nov 16 '15 at 08:39
@WBM - It's hard for me to reproduce this error. Could you give some details, like on which line this error occurs and what are the shapes of the arrays involved? For testing I used myself: `FIDO = np.random.randint(40000, size=(200, 200))` and `WBColor = np.random.rand(200, 200)`. – Nov 16 '15 at 18:14
FIDO is 200x200 , WBColor is 200x200 (-1 to 1). This is my current error `S_wb = labeled_mean(WBColor, FIDO, labels)[FIDO] IndexError: arrays used as indices must be of integer (or boolean) type` – William Baker Morrison Nov 16 '15 at 19:02
@WBM - If `FIDO` already contains integers you can do a `FIDO = FIDO.astype(int)` and then it should work. If the elements in `FIDO` are anything other than integers you could first do `notused, FIDO_int = np.unique(FIDO, return_inverse=True)` followed by `FIDO_int.shape = (200, 200)` and use `FIDO_int` in place of `FIDO`. – Nov 17 '15 at 09:45

Speeding-up "for-loop" in image analysis when iterations are up to 40,000

5 Answers5