1

Periodically, application would be receiving huge number of moving objects ( approx 100,00,000 [1 million] per second) with latitude and longitude. Requirement is to detect any objects which are within 400 meters distance and detection must be done in 400 ms (milliseconds).

So whenever application receive any new object with latiude and longitude, I would first need to add it in a data structure and check if any other objects in the data structure are within 400 meters distance from the newly added object in 400 ms.

From my research I have following 2 options: Option 1: Redis GEO can be used for above requirement if the number of objects are less. However, for 1 million of objects doing geoadd and georadius query will take more than 400 ms which is not acceptable.In future the objects can be 2 millions per second.

Option 2: Use of Octree data-structure which will give better performance I think its performance will also degrade (would take longer time than 400 ms) for 1 million of objects while updating the octree with new object and searching for the objects which are nearby by the new object.

I thought alot about partitioning the data using geohash. Example Use prefix of geohash and save the data in redis instance 1 and other geohas data in redis instance 2.However for corner cases when the two objects are within the range of 400 m but in neighbor quadrant it will fail.

Question Has anyone got any idea for partitioning data based on latitude and longitude and still detect the neighboring objects? Or to reduce the problem in map-reduce paradigm?

Can any one suggest a different approach considering that in future the objects can be 2 millions per second?

Nilesh
  • 43
  • 4

1 Answers1

0

Two points:

1) For partitioning, you can let the quadrants overlap, that means all point within 400m of a quadrant border are added two both quadrants. I think this should allow useful partitioning.

2) There are dedicated indexes for moving objects which are probably better than quadtrees, for example the MX-CIF-Quadtree. You could also try my own PH-Tree (Java sources). It scales nicely with large datasets (best use it with at least 10^6 points) and has good update performance. It actually works best with clustered data. It's basically a prefix-sharing quadtree with numerous optimisations (for example, it never requires rebalancing)). On an i7 3770K at 3.5GHz I can insert between 500K and 1M points per second with a tree size of up to 100M (I stopped testing at that point, but the tree should scale easily to larger datasets).

TilmannZ
  • 1,784
  • 11
  • 18