1

I'm looking into storing geospatial information using a geohash-like index, perhaps using Hilbert curves. My question is regarding how best to split up area queries on such an index.

This article for example shows how one might want to split an area query into multiple queries to avoid quering a range exhibiting poor locality (see this image). If you wanted to search the circular area with a single query using a Z curve like a normal geohash you would have to query the entire lower left quadrant which has only a tiny fraction of the area we're concerned with.

In this case it would be better to split the search into a few queries, however I haven't been able to find any information on how best to do this. Are there algorithms for splitting a range query like this into smaller ranges which cover the original area?

dylan.scott
  • 211
  • 2
  • 3
  • Might try asking on gis.stackexchange.com. It is a sister-sight to stackoverflow and focuses on purely GIS. – Jordan Parmer Jun 26 '11 at 19:54
  • Why aren't you just using a geospatialDB? There a lot of them out there and at least two of them are open source - PostGIS (on postgresql) and SpatialIte (on SQLLite) – TheSteve0 Jun 27 '11 at 02:56

1 Answers1

0

Once you've identified a hash prefix that covers your query bounds, you can begin splitting that prefix into constituent prefixes and testing whether each intersects your query bounds before keeping it. For example, say you've identified the prefix 0100 as covering your query area. The prefix 0100 comprises the prefixes 01000 and 01001, while the prefix 01000 comprises the prefixes 010000 and 010001, and the prefix 01001 comprises the prefixes 010010 and 010011, etc. While you're rewriting your prefix as a collection of larger prefixes (corresponding to smaller areas), you can filter out those prefixes that do not intersect your query bounds. You'll have to stop the splitting process at some point; each iteration of splitting potentially doubles the size of your prefix collection. You might create, for instance, a maximum prefix collection size at which point you declare satisfaction with your filtering; of course, there are other metrics that you could use to find a stopping point. As a final step, you can re-combine "adjacent" prefixes in order to reduce the number of searches that you are performing. If, for instance, you're left with the prefixes 01000 and 01001, you can combine these into 0100 to avoid a search for 01000 followed by a search for 01001 (a benefit under the assumption that the search process has overhead beyond sequential reads). You'll need a routine for calculating the bounding box of a hash prefix in order to test for intersection with your query bounds. This will depend upon the hashing scheme that you use.

Kevin L. Stern
  • 365
  • 1
  • 7