5

I'm working on a project where I have a feature in an image described as a set of X & Y coordinates (5-10 points per feature) which are unique for this feature. I also have a database with thousands of features where each have the same type of descriptor. The result looks like this:

myFeature: (x1,y1), (x2,y2), (x3,y3)...

myDatabase: Feature1: (x1,y1), (x2,y2), (x3,y3)...
            Feature2: (x1,y1), (x2,y2), (x3,y3)...
            Feature3: (x1,y1), (x2,y2), (x3,y3)...
            ...

I want to find the best match of myFeature in the features in myDatabase.

What is the fastest way to match these features? Currently I am stepping though each feature in the database and comparing each individual point:

bestScore = 0
for each feature in myDatabase:
    score = 0
    for each point descriptor in MyFeature:
        find minimum distance from the current point to the...
          points describing the current feature in the database
        if the distance < threshold:
            there is a match to the current point in the target feature
            score += 1

    if score > bestScore:
        save feature as new best match

This search works, but clearly it gets painfully slow on large databases. Does anyone know of a faster method to do this type of search, or at least if there is a way to quickly rule out features that clearly won't match the descriptor?

Mikael
  • 91
  • 4

2 Answers2

2

Create a bitset (an array of 1s and 0s) from each feature.

Create such a bitmask for your search criteria and then just use a bitwise and to compare the search mask to your features.

With this approach, you can shift most work to the routines responsible for saving the stuff. Also, creating the bitmasks should not be that computationally intensive.

If you just want to rule out features that absolutely can't match, then your mask-creation algorithm should take care of that and create the bitmasks a bit fuzzy.

The easiest way to create such masks is probably by creating a matrix as big as the matrix of your features and put a one in every coordinate that is set for the feature and a zero in every coordinate that isn't. Then turn that matrix into a one dimensional row. Compare the feature-row then to the search mask bitwise.

This is similar to the way bitmap indexes work on large databases (oracle e.g.), but with a different intention and without a full bitmap-image of all database rows in memory.

The power of this is in the bitwise comparisons.

On a 32bit machine you can perform 32 comparisons per instruction when you can just do one with integer numbers in a point comparison. It yields even higher boni for floating point operations, depending on the architecture.

Falcon
  • 3,150
  • 2
  • 24
  • 35
  • This assumes a bit per feature correct and storing the bitmask in a database column? What's the practical limit of how many features can be represented in a bitmask and what datatype would you use for storage? – orangepips Nov 05 '10 at 13:16
  • The issue here is that the features won't be exact, in my search feature a coordinate might be (120, 30) and in its corresponding match in the database it is (121, 28). To get around that I need an approximate comparison, which is why the threshold is used. – Mikael Nov 05 '10 at 13:30
  • Then tighten your matrix, for example take 4 points and make them one position in the bitset. That'll make it fuzzy (works like lowering the dpi of an image file) – Falcon Nov 05 '10 at 13:32
  • @orangepips I don't really know. My knowledge is purely theoretical at that point. I'd write a custom comparison function (not a procedure) and start with a raw type on oracle, so I can perform fast selects on the data. Note that there is no limitation to length, as you can process the bitsets in chunks of powers of 2. It'll always be faster than point to point comparisons of equal length. – Falcon Nov 05 '10 at 13:38
  • @Falcon So you're saying do the actual bit comparison with a programming language as opposed to using SQL? – orangepips Nov 05 '10 at 13:54
  • No, I'd first try using a function for SQL! A procedure can't be written into a SQL-Statement. select id from features where comparison_function(searchmask, featuremask) = true; But if that is too slow, I'd try other methods/concepts. – Falcon Nov 05 '10 at 13:58
1

This in general looks like a spatial index problem. It's not my field, but you'll probably need to build a sort of tree index, such as a quadtree, that you can use to easily search for features. You can find some links from this wikipedia article: http://en.wikipedia.org/wiki/Spatial_index

It might be a problem that you can easily implement in an existing spatial database. It's very GIS-like in its description.

One thing you can do is calculate a point of gravity for every feature and use that to whittle down the search space a bit (a one dimensional search is a lot easier to build an index for), but that has the downside of being just a heuristic (depending on the shapes of your feature, the point of gravity may end up in weird places).

wds
  • 31,873
  • 11
  • 59
  • 84