7

I wrote a spark streaming application built with sbt. It works perfectly fine locally, but after deploying on the cluster, it complains about a class I wrote which clearly in the fat jar (checked using jar tvf). The following is my project structure. XXX object is the one that spark complains about

src
`-- main
    `-- scala
        |-- packageName
        |   `-- XXX object
        `-- mainMethodEntryObject

My submit command:

$SPARK_HOME/bin/spark-submit \
  --class mainMethodEntryObject \
  --master REST_URL\
  --deploy-mode cluster \
  hdfs:///FAT_JAR_PRODUCED_BY_SBT_ASSEMBLY

Specific error message:

java.lang.NoClassDefFoundError: Could not initialize class XXX
Dr.Pro
  • 213
  • 5
  • 11

3 Answers3

4

I ran into this issue for a reason similar to this user: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-td18972.html

I was calling a method on an object that had a few variables defined on the object itself, including spark and a logger, like this

val spark = SparkSession
  .builder()
  .getOrCreate()

val logger = LoggerFactory.getLogger(this.getClass.getName)

The function I was calling called another function on the object, which called another function, which called yet another function on the object inside of a flatMap call on an rdd.

I was getting the NoClassDefFoundError error in a stacktrace where the previous 2 function calls in the stack trace were functions on the class Spark was telling me did not exist.

Based on the conversation linked above, my hypothesis was that the global spark reference wasn't getting initialized by the time the function that used it was getting called (the one that resulted in the NoClassDefFoundError exception).

After quite a few experiments, I found that this pattern worked to resolve the problem.

// Move global definitions here
object MyClassGlobalDef {

  val spark = SparkSession
    .builder()
    .getOrCreate()

  val logger = LoggerFactory.getLogger(this.getClass.getName)

}

// Force the globals object to be initialized
import MyClassGlobalDef._

object MyClass {
  // Functions here
}

It's kind of ugly, but Spark seems to like it.

turtlemonvh
  • 9,149
  • 6
  • 47
  • 53
  • i had a similar issue and this solved it. but it's still puzzling to me. my code used to run fine until a couple of days ago. do you know what's root cause for this? – qkhhly May 31 '20 at 01:19
  • I suspect the problem has something to do with how Spark does serialization, but aside from that I'm not sure. I haven't dug into this deep enough to debug. – turtlemonvh Jun 15 '20 at 14:03
2

It's difficult to say without the code but it looks like a problem of serialization of your XXX object. I can't say I'm understand perfectly why, but the point is that the object is not shipped to the executor.

The solution that worked for me is to convert your object to a class that extends Serializable and just instantiate it where you need it. So basically, if I'm not wrong you have

object test {
   def foo = ...
}

which would be used as test.foo in your main, but you need at minimum

class Test extends Serializable {
   def foo = ...
}

and then in your main have val test = new Test at the beginning and that's it.

Wilmerton
  • 1,448
  • 1
  • 12
  • 31
1

It is related to serialization. I fixed this by adding "implements Serializable" and serialVersionUID field to given class.