Here is my specific problem. I need to write an algorithm that:
1) Takes these 2 arrays:
a) An array of about 3000 postcodes (or zip codes if you're in the US), with the longitude and latitude of the center point of the areas they cover (that is, 3 numbers per array element)
b) An array of about 120,000 locations, consisting of longitude and latitude
2) Converts each location to the postcode whose centerpoint is closests to the given longitude and latitude
Note that the longitudes and latitudes of the locations are very unlikely to precisely match those in the postcodes array. That's why I'm looking for the shortest distance to the center point of the area covered by the postcode.
I know how to calculate the distance between two longitude/latitude pairs. I also appreciate that being closests to the center point of an area covered by a postcode doesn't necessarily mean you are in the area covered by that postcode - if you're in a very big postcode area but close to the border, you may be closer to the center point of a neighbouring postcode area. However, in this case I don't have to take this into account - shortest distance to center point is enough.
A very simple way to solve this problem would be to visit each of the 120,000 locations, and find the postcode with the closest centerpoint by calculating the distance to each of the 3000 postcode centerpoints. That would mean 3000 x 120,000 = 360,000,000 distance calculations though.
If postcodes and locations were in a one-dimensional space (that is, identified by 1 number instead of 2), I could simply sort the postcode array by its one-dimensional centerpoint and then do a binary search in the postcode array for each location.
So I guess what I'm looking for is a way to sort the two dimensional space of longitudes and latitudes of the postcode center points, so I can perform a two dimensional binary search for each location. I've seen solutions to this problem, but those only work for direct matches, while I'm looking for the center point closests to a given location.
I am considering caching solutions, but if there is a fast two-dimensional binary search that I could use, that would make the solution much simpler.
This will be part of a batch program, so I'm not counting milli seconds but it can't take days either. It will run once a month without manual intervention.