1

I have a list of coordinates in latitude/longitude that I have converted to GeoHash. My goal is to ID the points that are reported to be over water (oceans, seas, etc. outside of any countries borders). I also have a data set of all the shapes of all the worlds countries borders in latitude/longitude converted to GeoHash too.

So for a given GeoHash point I am trying to be able to classify it as being over (international) water or not. I thought about picking points manually in the middle of the ocean and using a short GeoHash prefix to create a large box in the ocean but that is fairly limited.

Perhaps generally there is a way to understand what it means to be a GeoHash point outside of any countries borders?

Mr PizzaGuy
  • 410
  • 6
  • 19
Zack
  • 151
  • 12

1 Answers1

0

It is not a good use of geohash. Geohash is good at identifying specific points, but not great at describing complex shapes like country borders or ocean.

I thought about picking points manually in the middle of the ocean and using a short GeoHash prefix to create a large box in the ocean but that is fairly limited.

Yes, that will give very imprecise result. What you need is to test each point, whether it belongs to any country's polygon. How you do this depends on the platform you use, e.g. in SQL you run an ST_Intersects(point, country) query.

I would just convert geohash back to lat/lon pair and check them.

If you do want to use geohash or if you have too many (billions) of points, you can use the short GeoHash prefix trick - but you would need to use many prefixes to represent each ocean. Something like the following, using prefix tree:

  • start with GeoHash length of couple letters,
  • for every possible GeoHash string, compute whether its box is fully contained by the ocean or land (using ST_Intersects or similar precise method).
  • if whole box belongs to one class - add it to prefix tree.
  • if not - add more letters (again, all possible combinations) and continue recursively up to some limit, where you need to stop.

Once you've built such tree - you can use GeoHash to lookup your answer quickly in this tree.

Michael Entin
  • 7,189
  • 3
  • 21
  • 26
  • Thanks for your input, but to elaborate/respond to a few items: Right now I have the data on Redshift (so using SQL). To the best of my knowledge there is no native geospatial tools in Redshift to do point within polygon types of calcs. And its in Redshift because there are about 32 Billion points. Which is why I was using GeoHash. I don't understand how to apply "for every possible GeoHash string, compute whether its box is fully contained by the ocean or land (using ST_Intersects or similar precise method)." Since that seems like the crux of the problem to begin with(?). thanks. – Zack Nov 04 '19 at 19:34
  • To start, generate all possible 3-letter geohash values, that's about 30k of them. In some DB with spatial support, like Postgres with PostGIS or Google BigQuery, test wether the points corresponding to these values belong to ocean. Export result to Redshift and join on 3-letter prefix. That will give you about 78km error margin. You can improve it by using longer prefixes where geohash box resolves to mix of land and ocean. – Michael Entin Nov 04 '19 at 21:55