0

For a few days now I'm working on a Sparklyr extension in which we could use an already created Spark SQL extension for indexing geospatial data into H3 hexagons (https://index.scala-lang.org/nuzigor/h3-spark) but I'm very confused on the steps to manage to "import" the code of the SQL extension in the Sparklyr session.

I've been researching and I found that I need to "declare" the customized jar with the new functions and register them in the sparklyr session.

What I'm doing now is this:

library(sparklyr)

config = sparklyr::spark_config()

config$spark.sql.extensions= c("com.nuzigor.spark.sql.h3.H3SqlExtensions")

config$`sparklyr.jars.default` = c("h3-spark_2.12-0.8.0.jar")

sc <- sparklyr::spark_connect(master = "local", version = "3.2.1", config = config)

# add UDF

DBI::dbGetQuery(sc,"add jar h3-spark_2.12-0.8.0.jar") ```

Until there, no errors. All ok. But when I try to use some of the functions of the Sql extension:


test <- data.frame(a = 1:10, b = '89283082e73ffff')

test <- copy_to(sc, test)

test %>% mutate(sql("h3_is_valid(b)"))


[![NA values](https://i.stack.imgur.com/BZqKJ.png)](https://i.stack.imgur.com/BZqKJ.png)

All of them return me either NA values with a lgl column or NA values in a int column (which in some cases make sense because the expected output is a integer).

What I'm doing wrong with this? I've looking in the sparklyr extensions guide but I don't find it very useful.

Thank you very much in advance.

Juan Ignacio

0 Answers0