1

If i run the following code in spark ( 2.3.2.0-mapr-1901) , it runs fine on the first run.

SELECT   count( `cpu-usage` ) as `cpu-usage-count` ,  sum( `cpu-usage` ) as `cpu-usage-sum` ,  percentile_approx( `cpu-usage`, 0.95 ) as `cpu-usage-approxPercentile` 
    FROM  filtered_set

Where filtered_set is a DataFrame that has been registered as a temp view using createOrReplaceTempView.

I get a result and all is good on the first call. But...

If i then run this job again, ( note that this is a shared spark context, managed via apache livy), Spark throws:

Wrapped by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Undefined function: 'count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 10
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$50.apply(Analyzer.scala:1216)
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$50.apply(Analyzer.scala:1216)
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1215)
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1213)
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)

...

org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)

This problem occurs on the second run of the Livy job ( which is using the previous Spark session). It is not isolated just to the count function, (etc also happens with sum, etc) and any function appears to fail on the second run, regardless of what was called in the first run.

It seems like Spark's function registry is being cleared out ( including the default built in functions). We're not doing anything with the spark context.

Questions: - Is this expected or normal behaviour with spark? - How would I re-set or initialise the spark session so it doesn't lose all these functions?

I have seen Undefined function errors described elsewhere in terms of user defined functions but never the built ins.

ZenMasterZed
  • 203
  • 2
  • 8
  • Does `sum` still work? You don't have any variable called `count`, right? – Shaido Mar 27 '19 at 09:23
  • No all the functions don't work second time around. The data inputs are identical on both calls. I certainly don't intentionally have such a variable. This issue looks identical to this: https://forums.databricks.com/answers/17583/view.html . as I am using temp views and getting the same issue. – ZenMasterZed Mar 27 '19 at 09:37
  • @ZenMasterZed That link seems to be broken. – Ivo Merchiers Mar 27 '23 at 09:02

0 Answers0