0

I am testing the Databricks Mosaic Spatial Grid Indexing method to obtain the h3 hex of a given lat, long.

# Get the latitude and longitude
latitude = 37.7716736
longitude = -122.4485852
 
# Get the resolution
resolution = 7
 
# Get the H3 hex ID
h3_hex_id = grid_longlatascellid(lit(latitude), lit(longitude), lit(resolution)).hex
 
# Print the H3 hex ID
print(h3_hex_id)
 
Column<'grid_longlatascellid(CAST(37.7716736 AS DOUBLE), CAST(-122.4485852 AS DOUBLE), 7)[hex]'>
 

How do I see the actual hex id in the code above?

when using h3.geo_to_h3, I get:

h3.geo_to_h3(float(latitude), float(longitude), 7)
'872830829ffffff'

According the docs, the h3 hex id returned by grid_longlatascellid looks different from what is returned by h3.geo_to_h3 method.

h3.geo_to_h3(float(latitude), float(longitude), 7)  
 
'872830829ffffff'
df = spark.createDataFrame([{'lon': 30., 'lat': 10.}])
df.select(grid_longlatascellid('lon', 'lat', lit(10))).show(1, False)
+----------------------------------+
|grid_longlatascellid(lon, lat, 10)|
+----------------------------------+
|                623385352048508927|

How do I obtain the h3 hex id using Databricks Mosaic library? I have the following imports and configurations:

import h3
from mosaic import enable_mosaic
enable_mosaic(spark, dbutils)
from mosaic import *
spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3")
 
kms
  • 1,810
  • 1
  • 41
  • 92

1 Answers1

2

In Mosaic we are using Long encoding of H3 cell IDs. h3.geo_to_h3 is returning a HEX encoding of the cell ID. It is the same value just in a different representation. If you casted the results of h3.geo_to_h3 to long/bigint you will get same values that Mosaic is returning. Longs are more efficient for spark joins and that is why we chose it. Hope this helps.