0

The situation and the goal

Imagine a user search system that provides a proximity search from a user’s own position, which is specified by a decimal latitude/longitude combination. An Atlanta resident’s position, for instance, would be represented by 33.756944,-84.390278 and a perimeter search by this user should yield other users in his area from a radius of 10 mi, 50 mi, and so on.

A table-valued function calculates distances and provides users accordingly, ordered by ascending distance to the user that started the search. It’s always a live query, and it’s a tough and frequent one. Now, we want to built some sort of caching to reduce load.

On the way to solutions

So far, all users were grouped by the integer portion of their lat/long. The idea is to create cache files with all users from a grid square, so accessing the relevant cache file would be easy. If a grid square contains more users than a cache file should, the square is quartered or further divided into eight pieces and so on. To fully utilize a square and its cache file, multiple overlaying squares are contemplated. One deficiency of this approach is that gridding and quartering high density metropolitan areas and spacious countrysides into overlaying cache files may not be optimal.

Reading on, I stumbled upon topics like nearest neighbor searches, the Manhattan distance and tree-esque space partitioning techniques like a k-d tree, a quadtree or binary space partitioning. Also, SQL Server provides its own geographical datatypes and functions (though I’d guess the pure-mathematical FLOAT way has an adequate performance). And of course, the crux is making user-centric proximity searches cacheable.

Question!

I haven’t found much resources on this, but I’m sure I’m not the first one with this plan. Remember, it’s not about the search, but about caching.

  • Can I scrap my approach? ;-)
  • Are there ways of an advantageous partitioning of users into geographical divisions of equal size?
  • Is there a best practice for storing spatial user information for efficient proximity searches?
  • What do you think of the techniques mentioned above (quadtrees, etc.) and how would you pair them with caching?
  • Do you know an example of successfully caching user-specific proximity search?
Community
  • 1
  • 1
dakab
  • 5,379
  • 9
  • 43
  • 67

1 Answers1

1

Can I scrap my approach?

You can adapt your appoach, because as you already noted, a quadtree uses this technic. Or you use a geo spatial extension. That is available for MySql, too.

Are there ways of an advantageous partitioning of users into geographical divisions of equal size

A simple fixed grid of equal size is fine when locations are equally distributed or if the area is very small. Geo locations are hardly equal distributed. Usually a geo spatial structure is used. see next answer:

Is there a best practice for storing spatial user information for efficient proximity searches Quadtree, k-dTree or R-Tree.

What do you think of the techniques mentioned above (quadtrees, etc.) and how would you pair them with caching?

There is some work from Hannan Samet, which describes Quadtrees and caching.

AlexWien
  • 28,470
  • 6
  • 53
  • 83