I really stuck in my issue and I searched extensively in the Net, but I couldn't find a solution for that, and I'm new to Spark-shell (Scala). ngrams
function works in Hive perfectly fine by the below command:
select ngrams(split(name, '\\W+'), 2, 3) from mytable
which returns top 3 bigram of column "name". When I call it in spark-shell by this command
val df = hiveContext.sql("select ngrams(split(name, '\\W+'), 2, 3) from mytable")
I got these errors:
Spark 2
org.apache.spark.sql.AnalysisException: Undefined function: 'ngrams'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
Spark 1.6
org.apache.spark.sql.AnalysisException: No handler for udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams
I also tried these ways with no success:
- I separated
split
fromngrams
, i. e. I ransplit
first, then ranngrams
. Surprisingly,split
works fine butngrams
does not. - I tried
sqlContext.register.udf("ngrams", ngrams)
and received:error: not found: value ngrams
I added 2 different Jar files versions (
hive-exec-1.2.0.jar
andhive-exec-3.0.0.jar
) using this command:spark-shell --jars /hive-exec-1.2.0.jar
spark-shell --jars /hive-exec-3.0.0.jar
and same errors.
I found the open source for ngrams
function in this github, but it is in Java and I dont know if I could call it in Spark-shell (Scala).
Maybe this is a trivial issue, and I would really appreciate it if someone could help me.
I'm using Scala 2.11.8, Java 1.8, Spark 2.3.0 and Spark 1.6