0

I think I understand how (x,y) or (lat, log) can be stored to allow retrieval of all points within some spacial range when using range queries on sorted data. Geohash accomplishes this by alternating lat (odd bits) and log (even bits), this way as resolution of x increases, resolution of y increases as well. Other methods like Hilbert curves also rely on this basic principle increasing resolution on both dimensions. However, I can't understand what adding an extra temporal dimension accomplishes. For example, GeoMesa uses index which look like this "YXTTYXTTYX" acconding to their site. What I don't get is how TT bits allow to ask questions like "get all points within X, Y range within min < t < max interval". Am I misunderstanding the purpose of those bits and extra dimensions in geospacial indexing?

The way I understand it, putting those bits in there makes temporal resolution increase as spacial resolution increases. Let's say that we have bits like [10] [00] where Ts are. The first bit divides space in two chunks, zero meaning less than half and 1 more than half, so, for an imaginary set of four thousand years, we get 0 = year < 2000, 1 = year > 2000. This [10] leads to 2000 < t < 2500, and next bit 0 lead to 2000 < t < 2250, and 0 to 2000 < t < 2125. Using this approach I don't see how to retrieve all events within certain timerange within certain spacial range and it's not clear what else this can be used for. All geospacial papers and presentations I've seen so far focus mostly on spacial hashing and don't discuss usage of extra dimensions in detail.

Community
  • 1
  • 1
CoolCodeBro
  • 769
  • 3
  • 9
  • 14

1 Answers1

1

GeoMesa website is wrong about a space-filling-curve is a geohash. A geohash is an invention from G. Niedermayer! But you can also flatten 3d a little bit like 2d. The sfc also preserve the locality information in 3d but with better quality the effort increases. I wouldn't recommend you a 3d hilbert curve. A z-curve is much easier to understand!

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • But if you flatten 3D into 1D, doesn't that mean that resolution increases on all 3 dimensions simultaneously? In other words, doesn't it mean that temporal range becomes bound to spacial range and vice versa? They way I see it now, if you want to search points far apart in time, they will also have to be far apart in space, and I don't understand how that can be useful for indexing. – CoolCodeBro Apr 18 '16 at 12:06
  • 1
    With x,y,z there isn't any bound I know except for example a sfc in 3d is power of 3. In 2d you have triangle inequality! Also a sfc traverse a point (x,y,z) only once in an octree you can have many points in a leaf! – Micromega Apr 18 '16 at 12:58
  • I don't understand how there can be no bound. E.g., in geohashing, if latitude is "-90 to +90", then the first bit says whether lat is -90 to 0 or 0 to +90. The second bit say whether longitude is -180 to 0 or 0 to +180. Assuming first bit is 1, the third bit then says whether latitude is 0-45 or 45-90, similarly the fourth bit says that lon is 0-90 or 90-180. Now, if I want to query a range, then – CoolCodeBro Apr 18 '16 at 14:05
  • (lets say I want to query lon) I can either query `1111*` (all lon within 90-180 and lat 45-90) or `1101*` (all lon 90-180 and lat 0-45). But I can't get all lon 90-180 for any latitude. Lat and lon are bound by the resolution. As I get more precise with the range of lon I want, I also have to get more precise lat answers. – CoolCodeBro Apr 18 '16 at 14:18
  • 1
    Normally bits are represented by integers. How do you want a range search with integers? – Micromega Apr 18 '16 at 15:59
  • This is just pseudo-code, sry for confusion. Using the box from https://en.wikipedia.org/wiki/Z-order_curve#Coordinate_values, to get all points within 1st small box (x=0 to 1) (y=0 to1) I can do 000000 < p 000011. But I can't select all values from 1st and 5th box (x=0 to 1, y=0 to 3), without also having extra X values in there from range (2 to 3), despite that for Y that space is continuous. This is what I mean by x being bound to y and both resolution increasing at the same time. – CoolCodeBro Apr 18 '16 at 17:11
  • To me it seems that adding time in there would mean that time resolution would be bound to spacial resolution, so let's say that the 3rd temporal dimension max value is one year, then getting values for the whole year (Z) would mean also getting all values within X and Y range (all values inside the whole cube), it seems impossible to get all values within broad temporal resolution while only having small spacial resolution and vice versa, just like it's impossible to query one small X range but a very broad range of Y. – CoolCodeBro Apr 18 '16 at 17:17
  • I don't think you have understood the sfc. If you want box 1 and 5 then there is a query it gives you the 4 boxes from the upper left and a query it gives you the 4 boxes of the lower left. I am not sure how to query box 1,2,3,5. I find a sfc funny and useful but like I said it is very abstract and mathematical thing. If my answer is helpful please consider to accept it! – Micromega Apr 18 '16 at 19:29
  • to get started with SFC you must read this article https://aws.amazon.com/blogs/database/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1/ – amirouche Dec 11 '18 at 20:09