For a few days now I'm working on a Sparklyr extension in which we could use an already created Spark SQL extension for indexing geospatial data into H3 hexagons (https://index.scala-lang.org/nuzigor/h3-spark) but I'm very confused on the steps to manage to "import" the code of the SQL extension in the Sparklyr session.
I've been researching and I found that I need to "declare" the customized jar with the new functions and register them in the sparklyr session.
What I'm doing now is this:
library(sparklyr)
config = sparklyr::spark_config()
config$spark.sql.extensions= c("com.nuzigor.spark.sql.h3.H3SqlExtensions")
config$`sparklyr.jars.default` = c("h3-spark_2.12-0.8.0.jar")
sc <- sparklyr::spark_connect(master = "local", version = "3.2.1", config = config)
# add UDF
DBI::dbGetQuery(sc,"add jar h3-spark_2.12-0.8.0.jar") ```
Until there, no errors. All ok. But when I try to use some of the functions of the Sql extension:
test <- data.frame(a = 1:10, b = '89283082e73ffff')
test <- copy_to(sc, test)
test %>% mutate(sql("h3_is_valid(b)"))
[](https://i.stack.imgur.com/BZqKJ.png)
All of them return me either NA values with a lgl column or NA values in a int column (which in some cases make sense because the expected output is a integer).
What I'm doing wrong with this? I've looking in the sparklyr extensions guide but I don't find it very useful.
Thank you very much in advance.
Juan Ignacio