Spark Launcher: Can't see the complete stack trace for failed SQL query

Question

I'm using SparkLauncher to connect to Spark in cluster mode on top of Yarn. I'm running some SQL code using Scala like this:

def execute(code: String): Unit = {
    try {
      val resultDataframe = spark.sql(code)
      resultDataframe.write.json("s3://some/prefix")
    catch {
      case NonFatal(f) =>
        log.warn(s"Fail to execute query $code", f)
        log.info(f.getMessage, getNestedStackTrace(f, Seq[String]()))
    } 
}

def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
   if (e.getCause == null) return msg
   getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}

Now when I run a query that should fail with the execute() method, for example, querying a partitioned table without a partitioned predicate - select * from partitioned_table_on_dt limit 1;, I get an incorrect stack trace back.

Correct stack trace when I run spark.sql(code).write.json() manually from spark-shell:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) LocalLimit 1
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
...

Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate found for partitioned table
 partitioned_table_on_dt.
 If the table is cached in memory then turn off this check by setting
 hive.mapred.mode to nonstrict
    at org.apache.spark.sql.hive.execution.HiveTableScanExec.prunePartitions(HiveTableScanExec.scala:155)
...

org.apache.spark.SparkException: Job aborted.
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
...

Incorrect stack trace from the execute() method above:

Job Aborted: 
"org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)",
"org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)",
...

"org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)",
"org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)",
...

The spark-shell stack trace has three nested exceptions SparkException(SemanticException (TreeNodeException)) but the traceback that I'm seeing with my code is only from the SparkException and TreeNodeException but the most valuable SemanticException traceback is missing even after fetching the nested stack traces in the getNestedStackTrace() method.

Can any Spark/Scala experts tell me what am I doing wrong or how do I fetch the complete stack trace here with all the exceptions?

check executor logs on yarn? you will get the cause of failure there. Also, I think, SparkLauncher only shows driver stacktrace — Som, May 25 '20 at 12:21
can you try without NonFatal in catch block ?? to get more details on exception ?? — Srinivas, May 25 '20 at 14:22
@SomeshwarKale the stack traces I shared in the post are from the driver. SparkLauncher does pull only the driver logs but it is missing stack trace information via my code snippet. — sbrk, May 26 '20 at 07:45
@Srinivas code is the actual query string. It is `select * from partitioned_table_on_dt limit 1` for the stack traces above. — sbrk, May 26 '20 at 07:46

score 0 · Accepted Answer · answered May 26 '20 at 09:10

The recursive method getNestedStackTrace() had a bug.

def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
   if (e == null) return msg // this should be e not e.getCause  
   getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}

Spark Launcher: Can't see the complete stack trace for failed SQL query

1 Answers1

Linked