As @lejlot suggested before, the code is quite hard to vectorize. It cannot be run in parallel, unless you know which pixels belong to each FIDO in advance. I don't know if you call FIDO to superpixels, but I do usually work with this kind of problems, and the best solution I've found so far is as follows:
Flatten the data:
data = data.reshape(-1, 3)
labels = FIDO.copy()
Here data
is your (Width, Height, 3)
image, rather than the separate 3 vectors that you have. It gets flattened to (Width * Height, 3)
.
Relabel FIDO to 0..N-1
range, where N
=num unique FIDO:
from skimage.segmentation import relabel_sequential
labels = relabel_sequential(labels)[0]
labels -= labels.min()
The above, from scikit-image
, transforms your FIDO array to the [0, N-1]
range, which is much easier to work with later.
Last, code in cython a simple function to calculate the mean for each of the FIDO;s (as they are ordered from 0 to N, you can do it in a 1D array with length N):
def fmeans(double[:, ::1] data, long[::1] labels, long nsp):
cdef long n, N = labels.shape[0]
cdef int K = data.shape[1]
cdef double[:, ::1] F = np.zeros((nsp, K), np.float64)
cdef int[::1] sizes = np.zeros(nsp, np.int32)
cdef long l, b
cdef double t
for n in range(N):
l = labels[n]
sizes[l] += 1
for z in range(K):
t = data[n, z]
F[l, z] += t
for n in range(nsp):
for z in range(K):
F[n, z] /= sizes[n]
return np.asarray(F)
You can call that function later (once compiled with cython), as simple as:
mean_colors = fmeans(data, labels.flatten(), labels.max()+1) # labels.max()+1 == N
The image of mean colors can then be recovered as:
mean_img = mean_colors[labels]
If you do not want to code in cython, scikit-image
also provides bindings for this by using a graph structure and networkx
, however is much slower:
http://scikit-image.org/docs/dev/auto_examples/plot_rag_mean_color.html
The above example contains the function calls you need to get an image with the average color of each superpixel as labels1
(your FIDO).
NOTE: The cython approach is much faster, as instead of iterating the number of unique FIDO N
and for each of them scan the image (size M = Width x Height
) this only iterates the image ONCE. Thus, computational cost is in the order of O(M+N)
rather than O(M*N)
of your original approach.
Example test:
import numpy as np
from skimage.segmentation import relabel_sequential
sX=200
sY=200
FIDO = np.random.randint(0, sX*sY, (sX, sY))
data = np.random.rand(sX, sY, 3) # Your image
Flatten and relabel:
data = data.reshape(-1, 3)
labels = relabel_sequential(FIDO)[0]
labels -= labels.min()
Obtain mean:
>>> %timeit color_means = fmeans(data, labels.flatten(), labels.max()+1)
1000 loops, best of 3: 520 µs per loop
It takes 0.5ms (half a millisecond) for a 200x200 image for:
print labels.max()+1 # --> 25787 unique FIDO
print color_means.shape # --> (25287, 3), the mean color of each FIDO
You can restore the image of mean colors using smart indexing:
mean_image = color_means[labels]
print mean_image.shape # --> (200, 200, 3)
I doubt you can get that speed with raw python approaches (or at least, I didn't find how).