How to correctly handle a spark.sql.AnalysisException

Question

I've been using Spark Dataset API to perform operations on a JSON to extract certain fields as needed. However, when the specification that I provide to let spark know what field to extract goes wrong, spark spits out an

org.apache.spark.sql.AnalysisException

How can unchecked runtime exceptions be handled in a distributed processing scenario like this ? I understand that throwing a try-catch would get things sorted but what is the recommended way to handle such a scenario

dataset = dataset.withColumn(current, functions.explode(dataset.col(parent + Constants.PUNCTUATION_PERIOD + child.substring(0, child.length() - 2))));

Wrap the exception in one of my custom exceptions and throw it from there, to be utilized by the calling object to handle the exception — Infamous, Jul 02 '18 at 09:32
But practically speaking does it not mean stop, fix and re-run. That was more my point. — thebluephantom, Jul 02 '18 at 09:47
Um yeah, that's one way to go, but wrapping runtime exceptions such as this with my own exceptions a good idea ?? — Infamous, Jul 02 '18 at 09:58
No, not disputing that - that's fine, but the upshot is that we still probably need to re-run after fixing. At least in BI & DWH that has always been the case. — thebluephantom, Jul 02 '18 at 09:59

Juh_ · Answer 1 · 2020-11-13T15:58:38.490

In scala, you should simply wrap the call in a Try and manage Failure. Something like:

val result = Try(executeSparkCode()) match {
    case s: Success(_) => s;
    case Failure(error: AnalysisException) => Failure(new MyException(error));
}

Note 1: If your question implies how to manage exception in scala, there are a lot of doc and post about this subject (i.e. don't throw). For example, you can check that answer (of mine)

Note 2: I don't have a scala dev env right here, so I didn't test this code)

In java there is a tricky situation however: the compiler doesn't expect an AnalysisException which is unchecked so you cannot catch this exception specifically. Probably some scala/java misunderstanding because scala doesn't track checked exceptions. What I did was:

try{
    return executeSparkCode();
} catch (Exception ex) {
    if(ex instanceOf AnalysisException){
        throw new MyException(ex);
    } else {
        throw ex; // unmanaged exceptions
    }
}

Note: In my case, I also tested the content of the error message for a specific exception that I must managed (i.e "path does not exist") in which case I return an empty dataset instead of throwing another exception. I was looking for a better solution and happened to get here...

Hi. This was a little time back when I started off as a fresh undergrad in my first job. I used a multi-catch with the last catch being the Exception class. Its been running on prod for quite some time now, gives out pretty decent information/stacktraces when things go wrong. — Infamous, Nov 14 '20 at 15:15
I am try/catching with a `try {/*spark dataframe filtering query*/} catch { case e: Throwable => println(e.getMessage)}` and seeing it still throw `AnalysisException` and crash :( Is it not a `Throwable`? — Rimer, Apr 14 '21 at 18:52
From what you said, it should work. Maybe the issue is else where, but I would need to see more code to know. Did you check the stacktrace to be sure the exception is thrown inside the try/catch? Ex: spark being lazy, errors happens on the call to action methods (write, collect), not in the call to filter even if the issue is in the filter call — Juh_, Apr 15 '21 at 08:18

How to correctly handle a spark.sql.AnalysisException

1 Answers1