4

I wrote a k-means clustering algorithm and a color quantization algorithm. They work as expected in terms of results but I want to make them faster. In both implementations I need to solve a problem: there are two arrays of points in a 3D space, then for each point of the first array, you need to find the closest point from the second array. I do it like this:

size_t closest_cluster_index;
double x_dif, y_dif, z_dif;
double old_distance;
double new_distance;

for (auto point = points.begin(); point != points.end(); point++)
{
    //FIX
    //as suggested by juvian
    //K = 1
    if (point != points.begin())
    {
        auto cluster = &(clusters[closest_cluster_index]);

        r_dif = cluster->r - point->r;
        g_dif = cluster->g - point->g;
        b_dif = cluster->b - point->b;

        new_distance = r_dif * r_dif + g_dif * g_dif + b_dif * b_dif;

        if (new_distance <= std::sqrt(old_distance) - ColorU8::differenceRGB(*(point - 1), *point))
        {
            old_distance = new_distance;
            //do sth with closest_cluster_index;
            continue;
        }
    }
    //END OF FIX

    old_distance = std::numeric_limits<double>::infinity();

    for (auto cluster = clusters.begin(); cluster != clusters.end(); cluster++)
    {
        x_dif = cluster->x - point->x;
        y_dif = cluster->y - point->y;
        z_dif = cluster->z - point->z;

        new_distance = x_dif * x_dif + y_dif * y_dif + z_dif * z_dif;

        if (new_distance < old_distance)
        {
            old_distance = new_distance;
            closest_cluster_index = cluster - clusters.begin();
        }
    }
    //do sth with: closest_cluster_index
}

How can I improve it? (I don't want to make it multithreaded or computed by GPU)

John Smith
  • 43
  • 5
  • 4
    use an appropiate data structure like kdtree to have all points from cluster, then for each point in first array ask the structure its closest point – juvian Nov 12 '18 at 14:29
  • @juvian Your comment really looks like an answer ;) – YSC Nov 12 '18 at 14:35
  • dup?: https://stackoverflow.com/q/43372780/5470596 – YSC Nov 12 '18 at 14:37
  • @YSC not used to short answers, but gave it a try ^^ – juvian Nov 12 '18 at 14:45
  • "I don't want to make it multithreaded or computed by GPU" – That's unfortunate, because GPGPU in particular will likely give you a massive performance *and* energy efficiency improvement. Your problem doesn't lend itself well to using more efficient data structures, as was discussed below, and is perfectly suited for GPU (simple FP math + extremely map-reduce friendly, i.e. massively and easily parallelizable). Ruling out options a priori can be a mistake. – Arne Vogel Nov 12 '18 at 17:15
  • 1
    @ArneVogel Yeah I know. It's very simple to split this algorithm for multiple threads. However it's part of my work for studies. I'm going to measure its excecution time, rate results and compare with other algorithms like median-cut. To make a fair judgment I do all in one thread and use only CPU. – John Smith Nov 12 '18 at 18:41
  • @JohnSmith what happened :( – juvian Nov 20 '18 at 16:36
  • @juvian I didn't have time to work on it just yet. But I guess your answer is the best so I marked this question as solved. – John Smith Nov 22 '18 at 10:56

1 Answers1

6

There are multiple data structures for efficient nearest neighbour queries. For 3d, a kdtree works really well, and has a complexity of O(log n) for each query on average which would improve your current O(n).

So with this structure you can add all your points from clusters to it, and then for each point in points, you can use the structure to query the closest point. For your particular case, a static kdtree is enough, as you don´t need to update points.

Another approach:

We can try to risk doing extra computations on some points in exchange for fewer on others. This method should work well with the following assumptions:

  • The distance between a cluster with another is far
  • The distance between a point and the adjacent point is low

I think these apply to your case because your clusters are few colors and your points come from a real image, which tends to have similar colors between adjacent pixels.

For each point, create a heap. Instead of storing the closest cluster, store in the max heap the closest k clusters. When you move to the next point, we can use this information. Let's call this point P and its kth closest cluster C.

Now for a new point P2, before comparing to all clusters we will check if the closest cluster to P2 is in our heap. This can only be true if the distance between any cluster from the heap and P2 is <= distance(P, C) - distance(P, P2). When this holds true, we can check only in our heap instead of all clusters. When it is not true, we compare against all and rebuild our heap and P will be P2.

You will need to try out different values of k to see if it improves. For the case of K = 2, might be worth avoiding the added complexity of a heap and just use variables.

juvian
  • 15,875
  • 2
  • 37
  • 38
  • Thank you for your answer. I tried this implementation of KDTree: https://github.com/crvs/KDTree, and changed my code like this: KDTree clusters_tree(colorsToPointVec(clusters)); auto pixel_cluster = pixels_clusters.begin(); for (auto pixel = pixels.begin(); pixel != pixels.end(); pixel++) { /*do sth with index*/ = clusters_tree.nearest_index(colorToPoint_t(*pixel)); } to my suprise right now it actualy works few times slower. I don't know if it's implemenation fault or something else. – John Smith Nov 12 '18 at 15:31
  • 1
    @John Smith What is the size n of your problem? Also, you could check if the issue is construction of the tree, or use of it. – Damien Nov 12 '18 at 15:45
  • @Damien Problem is in use. I have at least 1920 * 1080 3D points (or more depending on the source image) and 2^n 3D cluster points (common case n = 8). – John Smith Nov 12 '18 at 15:49
  • 1
    @JohnSmith okay, so in your case N is way bigger than the amount of clusters, so this won´t help you. You would notice a difference if you had many clusters, but with so few of them it is not worth. How long is it currently taking? – juvian Nov 12 '18 at 15:57
  • @juvian When I have 2073600 points, 256 clusters and use my algorithm instead of KDTree: finding closest points (by using that algorithm): ~0.89s, other stuff done in k-means: ~0.008s, that gives total iteration time: ~0.9. And to get acuracy of 1.0 in this particular case it does 16 iterations: total ~15s (most of it is finding closest points). – John Smith Nov 12 '18 at 16:09
  • 3
    @JohnSmith you could also try using an octree. The main problem is that even if a structure gives you nearest neighbour in O(log clusters) steps, those steps have a higher constant than the simple for loop, so doing 8 of those steps can be actually more expensive than just 256 simple operations – juvian Nov 12 '18 at 16:14
  • 2
    @JohnSmith you might want to compare your current performance to other color quantization tools, and if they are better try to search how are they doing it. [leptonica](http://www.leptonica.com/color-quantization.html) is a known tool and has good explanations of their approaches. – juvian Nov 12 '18 at 16:30
  • 1
    @JohnSmith 16 iterations looks good. A crucial point is selection of the clusters for the first iteration. I guess you already tried to optimize it. A possibility is to use a sub-optimal but faster algorithm for this first try. For example first running your algorithm with less points, then increase the number of points. It don't know if it makes sense in your case. – Damien Nov 12 '18 at 17:04
  • @Damien Yeah, I can try it. At the moment I just take random points as starting point. I also though of running median-cut on all points and then taking its result as clusters for the first iteration. Other metods that I'm going to try are: 'Diagonal of Cube' and 'Standard Deviation'. My main concern is that 'finding closest point' algorithm but I guess (judging from the answsers) I won't be able to fix it with some more efficient data structures. – John Smith Nov 12 '18 at 18:34
  • @juvian I made a quick implementation with k=1. Time without it was: ~14s, and with it ~9s. That's a huge improvement and there's probably still room for more improvement with more k's. I'm not sure about that condition though: 'between any cluster from the heap and P2 is <= distance(P, C) - distance(P, P2)' - when k=1 just previous cluster. In both cases RMSE beetwen orginal and processed image was diferent (it was worse in optimised version). That means that this condition isn't good enough cause brute force finds a better match. – John Smith Nov 12 '18 at 22:55
  • @JohnSmith can you update with the relevant code of that? – juvian Nov 12 '18 at 22:58
  • @juvian Ok, I found two bugs (I updated the code again). Now RMSE in both cases is the same. The performance is not as good as before, though. ~14.1s to ~13.8s. I'll try with more K's, and test more images tomorrow and I'll keep you updated. Thanks for your suggestion. – John Smith Nov 12 '18 at 23:28
  • @JohnSmith alright, keep me updated. Not sure if your code was updated with bug fixes but atm several bugs :P. But yeah k = 1 won't help much, maybe k = 2 – juvian Nov 13 '18 at 00:55