1

I am looking at a pythonic implementation of this top rated accepted answer on GIS SE - Using geohash for proximity searches? and I am unable to retrieve any matches for my geohash query. Here is the approach I have tried so far.

To run this Minimum Verifiable Complete Example(MVCE) you need to download the following files - geohash int and sortedlist python and install the python sortedlist via pip. You also need to have the latest version of Cython installed on your machine so as to wrap the C functionality of geohash-int(Note I am only wrapping what is necessary for this MVCE).

geohash_test.py

# GeoHash is my Cython wrapper of geohash-int C package
from geo import GeoHash
from sortedcontainers import SortedList

import numpy as np

def main():
    # Bounding coordinates of my grid.
    minLat = 27.401436
    maxLat = 62.54858
    minLo = -180.0
    maxLo = 179.95000000000002
    latGrid = np.arange(minLat,maxLat,0.05)
    lonGrid = np.arange(minLo,maxLo,0.05)
    geoHash = GeoHash()

    # Create my own data set of points with a resolution of
    # 0.05 in the latitude and longitude direction. 
    gridLon,gridLat = np.meshgrid(lonGrid,latGrid)
    grid_points = np.c_[gridLon.ravel(),gridLat.ravel()]

    sl = SortedList()

   #Store my grid points in the best resolution possible i.e. 52(First step in accepted answer)
   for grid_point in grid_points:
      lon = grid_point[0]
      lat = grid_point[1]
      geohash = geoHash.encode(lat,lon,52)
      bitsOriginal = geohash["bits"]
      sl.add(bitsOriginal)
      #Derive the minimum and maximum value for the range query from method below
   minValue,maxValue = getMinMaxForQueryGeoHash(geoHash)
   # Do the actual range query with a sorted list
   it = sl.irange(minValue,maxValue,inclusive=(False,False))
   print(len(list(it)))

def getMinMaxForQueryGeoHash(geoHash):
    lonTest = 172.76843
    latTest = 61.560745

    #Query geohash encoded at resolution 26 because my search area
    # is around 10 kms.(Step 2 and 3 in accepted answer)

    queryGeoHash = geoHash.encode(latTest,lonTest,26)

    # Step 4 is getting the neighbors for query geohash

    neighbors = geoHash.get_neighbors(queryGeoHash)
    bitsList = []
    for key,value in neighbors.items():
       bitsList.append(value["bits"])

    #Step 5 from accepted answer
    bitsList.append(queryGeoHash["bits"])

   # Step 6 We need 1 to all the neighbors 
   newList = [x+1 for x in bitsList]
   joinedList = bitsList + newList

    #Step 7 Left bit shift this to 52 
    newList2 = [x <<26 for x in joinedList]
    #Return min and max value to main method
    minValue = min(newList2)
    maxValue = max(newList2)
    return minValue,maxValue

 main()

If one were to write this out as a pseudocode here is what I am doing

  1. Given my bounding box which is a grid I store it in the highest resolution possible by computing the geohash for each latitude and longitude(this happens to be bit depth 52)

  2. I add the geohash to a sorted list

  3. Then I would like to do a range query by specifying a search radius of 10 kms for a specific query coordinate

  4. From the accepted answer to do this you need the min and max value for a query geohash

  5. I calculate the min and max value in the method getMinMaxForQueryGeoHash

  6. Calculate the query geohash at bit depth 26(this is the radius of 10 kms)

  7. Calculate the neighbors of the query geohash and create the 18 member array

  8. The 18 members are the 8 neighbors returned from the C method plus the original query geohash and the remaining 9 are obtained by adding 1 to this array

  9. Then left bit shift this array by 26 and return the min and max value to the irange query of sorted list.

    Bit shift = 52(maximum resolution) - query geohash precision(26) = 26

But that query returns me a NULL. Could somebody explain where I am going wrong ?

gansub
  • 1,164
  • 3
  • 20
  • 47

1 Answers1

3

using your jargon: for a MVCE, you not need a complex two-languages implementations. There are a lot of simple good implementations of Geohash, some in 100% Python (example). All them use the Morton Curve (example).

Conclusion: try to plug-and-play a pure-Python implementation, first test encode/decode, them test the use of neighbors(geohash) function.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • Thanks for your answer. I am trying to use geohash to do what I call an approximate nearest neighbor search without calculating distances(which in my current code is very expensive). I would be satisfied with approximate matches. Hence the need for a range query. – gansub Sep 17 '19 at 13:04
  • The `neighbors(geohash)` that I cited and [is here in the cited example](https://github.com/kylebebak/py-geohash-any/blob/master/py_geohash_any/geohash.py#L136)... But it is topological, to transform into "calculate distances" you must estimate from cell width or cell centroids... LatLong is not a metric system, see [faster distance estimators as Turf](https://turfjs.org/docs/#distance) or any Python geo-lib. If need more than "1 cell distance" need to use iterations, so it is not efficient for more than 1 or 2 cells-distant "nearest neighbor search" (you is not precise about the focus). – Peter Krauss Sep 17 '19 at 13:44
  • Now I am starting to get it( I think). You are saying my lat long grid cannot be used to do range queries using geohash. But if the data is topological such as a sphere or an ellipsoid those points can be indexed using a geohash and then we can do range queries. Am I correct ? – gansub Sep 17 '19 at 13:52
  • Topological in this case is about "grid cell topology" and choices of [4-connected or 8-connected](https://en.wikipedia.org/wiki/Pixel_connectivity#4-connected)... Well, you can also "rewind all" and check the big picture of your real problem. Today (2019) there are other faster indexes to do this, the best one, on my opinion, is [H3 Uber](https://uber.github.io/h3/), based on hexagonal grid. – Peter Krauss Sep 17 '19 at 14:03