3

Consider I have the data about meteo station coordinates, historical temperature values and city center coordinates worldwide. Meteo stations are be placed on different distances to the city centers. The task is to determine average historical values of city temperature by the data of meteo stations.

To solve it for each city I need to find the set of closest meteo stations in some radius and average their data. The brutforce way is to calculate distances from each city to each meteo station, but it's too slow for my data. So I thought some tree data structure could help here. I tried to use R-trees to partition meteo stations by coordinates but there is a problem - such approach allows me to find meteo stations in some tree node, but it doesn't give me information about neighbouring nodes, to calculate radius condition fast (for example in case city is very close to the R-tree's node's border).

Is there a standard tree data structure which allows to search needed node fast, but also provides the set of spatial neighbours on the same tree level?

Kivan
  • 1,712
  • 1
  • 14
  • 30

2 Answers2

1

How about using a database? And then query it to find points close to some particular point? A lot of database already support geospatial data, which you can index and query:

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • As I see in docs, indexing is based on R-trees as well, so it seems that DBs will have the same problems as I have. – Kivan Dec 23 '15 at 07:27
  • 1
    @user3231055 your statement looks rather funny. Most probably you have done something wrong/have not implemented r-tree correctly, have not used the data in a correct way. Because these databases has been used for years and not a lot of people claim that they have some problems. – Salvador Dali Dec 23 '15 at 07:29
1

You should probably not be concerned about 'neighbours on same level' or such, this information does not necessarily mean much. I think you should probably

  1. Decide whether you want all weather stations within a given distance (range query) or the closest k weather stations.
  2. Then I would just use the API of the index that you are using to find the stations.
  3. Then calculate the distance.

R-Trees are okay for that, but they are usually quite slow for loading. If loading times are a problem, you may want to try R+tree, R*tree, or maybe Quadtrees (for small datasets) or PH-Tree (for large datasets, my implementation in Java).

How the data is organsized inside the tree should not be a concern. Who ever implemented the tree probably implemented the most efficient way of finding desired neighbours.

TilmannZ
  • 1,784
  • 11
  • 18