Fastest way to perform thousands of comparisons

Question

I'm working on image processing tool which translates RGB colors to human readable keywords. I need to:

Initialize/Declare 1000 static known elements set
Perform 10 comparisons between a dynamic element and the whole 1000 elements (total 10x1000 comparisons)
Repeat steps 1 & 2 thousands of images.

Each element is a Struct like this one:

struct KnownColor{
    cv::Scalar rgb;
    std::string name;
    std::vector<std::string> strings;

    KnownColor ( cv::Scalar rgb, std::string name, std::vector<std::string> strings) :
        rgb(rgb),
        name(name),
        strings(strings) {};
};

Currently I'm using a static std::vector<> to store them. To init this vector I'm currently using a function called once, and I push my 1000 elements into the vector. Every struct is initialised using the KnownColor constructor:

static std::vector<CustomStruct> Vector;    

void vectorInit() //I call this only once per image:
{
    ...
    Vector.push_back(KnownColor(Scalar(188,198,204),"Metallic Silver",std::vector<std::string>{"gray"}));
    Vector.push_back(KnownColor(Scalar(152,175,199),"Blue Gray",std::vector<std::string>{"gray","blue"}));
    ... (1000 push_back in total)
}

The comparison function, is a custom Euclidean distance between the Scalars which returns the closest "knownColor" and its human readable keywords:

double min = std::numeric_limits<double>::max();
int index=0;
for(int i=0; i<KnownColors.size(); i++)
{
    double tmp = scalarLABDistance(rgb,KnownColors[i].rgb);
    if(tmp < distanceThreshold )
    {
        min = tmp;
        break;
    }
    if(tmp < min )
    {
        min = tmp;
        index = i;
    }
}
return KnownColors[index].strings;

Sadly for now, the program is called from PHP and needs to be executed once per image (client request).

I wonder if a static init would be better (init is faster? iteration is faster?), or if there is a better way to compare something to a collection of static elements without using std::vector.

Is your performance concern initialization time or compare time? You have shown the initialization. Presumably the initialization happens only once. — Dale Wilson, Jun 23 '14 at 16:10
If it's the comparison time you're worried about you should consider parallelizing your problem. GPUs are often used in image processing, OpenMP can have very low coding overhead for good speedup as well. — maxywb, Jun 23 '14 at 16:11
You can speed up your initialization code a bit by calling Vector.reserve(1000) before the first push_back. — Sjlver, Jun 23 '14 at 16:12
What kind of comparison? Are you comparing for absolute equality (such that you can short-circuit after the first `false` comparison) or doing something that absolutely requires all 1000 comparisons each time? — 0xbe5077ed, Jun 23 '14 at 16:13
Rather than making us guess what your comparison code looks like. Show the code -- unless you haven't written it yet, in which case, go write it (and measure its performance to see if it's good enough.) — Dale Wilson, Jun 23 '14 at 16:15
I need to execute the whole program for each image being processed (client specification). And for each image, i initialize the vector once and perform 10 comparisons (i optimised the comparison so when the distance is less than a threshold i accept that result and breck the loop). The comparison is an euclidean distance. — GuillermoMP, Jun 23 '14 at 16:20
@GuillermoMP why do you need to initialize this array for each image? — maxywb, Jun 23 '14 at 16:21
we don't know what exactly you do, so there's no way of telling what's the fastest way. — Karoly Horvath, Jun 23 '14 at 16:25
I have expanded the question with detailed information. @maxywb Code is being called from PHP and currently it is a client spec. I know that processing sets of images would be much better, but for now i have to stick to single image execution. — GuillermoMP, Jun 23 '14 at 16:43
@GuillermoMP: Speed suggestion: see if you can run it as a daemon and communicate via local Unix socket. — Zan Lynx, Jun 23 '14 at 16:52
@ZanLynx Oh thank you, it sounds like a great solution to fix the multiple initialization problem. — GuillermoMP, Jun 23 '14 at 16:54
http://stackoverflow.com/questions/1678457/best-algorithm-for-matching-colours/1678498#1678498 — Zan Lynx, Jun 23 '14 at 16:55
http://stackoverflow.com/questions/1720528/what-is-the-best-algorithm-for-finding-the-closest-color-in-an-array-to-another — Zan Lynx, Jun 23 '14 at 16:56

Zan Lynx · Answer 1 · 2014-06-23T17:02:16.233

3

What I have done in similar situations is write a program to build a C file for me with the data structure that I need already sorted out. Usually I write this in Perl or Python. The data might be generated or it might be read from data files.

You can build a Boost array at compile time which would not require time for initialization. However, writing a 1000 elements into a vector should be super-fast on any desktop type CPU.

Searching your 1000 elements is more of a problem.

You said you are looking for the euclidean distance. What game developers have used to solve this is a technique called Binary Space Partition. That may help you. The way it works, very roughly, is that a 2D space is divided into 4 squares. A 3D space is divided into 8 cubes. Then each of those spaces is further divided, down to the smallest useful unit. This gives a quick way to search a tree. It gets complicated when searching for areas that don't fit into one partition though. You might decide to search multiple trees or put duplicates of objects in all trees it touches. For circles or spheres you might fall back to a perfect distance calculation or you might just subdivide it into squares/cubes until the smallest unit and call it close enough. There are a lot of examples and literature.

And, I am almost certain that you CAN hash this because a point in a 2D BSP is a string of 2-bit location values which is just an integer of various sizes depending on your BSP resolution.

It looks like another good option for this might be a kd-tree. That's a "k" dimensional tree. I haven't used one but from what I just read about them they look rather useful for this.

edited Jun 23 '14 at 17:02

answered Jun 23 '14 at 16:16

Zan Lynx

53,022
10
79
131

1

-1 for the premature post. wait till the question is properly specified. – Karoly Horvath Jun 23 '14 at 16:21
1

@KarolyHorvath: Its right there in the question title and his first paragraph. – Zan Lynx Jun 23 '14 at 16:22
no it isn't. you have a bunch of unjustified presumptions, e.g. that hashing gives meaningful results for evaluating the comparison. – Karoly Horvath Jun 23 '14 at 16:27
@KarolyHorvath: Looks like he added a comment saying Euclidean Distance, so you're right. I assumed equality. I'll update the answer to remove the stuff about hash tables, the rest may still be useful. – Zan Lynx Jun 23 '14 at 16:39
Also Zan, im not searching but comparing elements. I extended the question with detailed information. Thank you anyways. – GuillermoMP Jun 23 '14 at 16:41
If you need nearest neighbour search, k-d tree has that. http://en.wikipedia.org/wiki/K-d_tree#Nearest_neighbour_search – tsnorri Jun 23 '14 at 17:17
Thanks again. I have considered the option of clasifying colors in ranges. However the problem with this approach is the HARD WORD to classify and define the color ranges in a 3D space, with the extra handicap of colours not being easy to fit into "cubes" (which would make this approach easier). I've found WAY easier and effective to manually classify a know set (~1000 colours) and then search for the nearest classified color. Then, i started testing, and just keep adding and upgrading my set of colours when I found errors. – GuillermoMP Jun 23 '14 at 17:31

Fastest way to perform thousands of comparisons

1 Answers1