0

I have a single Ubuntu server where I ran a Master and a Slave (one executor) and they show up on 8080 UI.
I can run spark-shell --master spark://foo.bar:7077 successfully, but I can't submit my program (fat jar) successfully and I get errors (standalone mode).

I have a Main object which extends App instead of having a main method. and all in a package myProject. And I am running my program like this:

 spark-submit --master spark://foo.bar:7077 \
--class myProject.Main \
--deploy-mode client \
--num-executors 1 \
--executor-memory 58g \
--executor-cores 39 \
--driver-memory 4g \
--driver-cores 2 \
--conf spark.driver.memoryOverhead=819m \
--conf spark.executor.memoryOverhead=819m \
target/scala-2.12/myProject-assembly-0.1.jar

client mode's output:

Exception in thread "main" java.lang.NoSuchMethodError: scala.App.$init$(Lscala/App;)V
        at mstproject.Main$.<init>(Main.scala:8)
        at mstproject.Main$.<clinit>(Main.scala)
        at mstproject.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

I already checked a similar error, But all of my packages seems compatible with Scala 2.12 as my build.sbt shows (I am not sure about my assemblyMergeStrategy):

scalaVersion := "2.12.12"
libraryDependencies ++= Seq(
  "com.github.pathikrit" %% "better-files" % "3.9.1",
  "org.scalatest" %% "scalatest" % "3.2.3" % Test,
  "org.apache.spark" %% "spark-core" % "2.4.8",
  "org.apache.spark" %% "spark-sql" % "2.4.8",
  "org.apache.spark" %% "spark-graphx" % "2.4.8",
  "redis.clients" % "jedis" % "3.5.1",
  "com.redislabs" %% "spark-redis" % "2.4.2"
)

assemblyMergeStrategy in assembly := {
  case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("com", "google", xs @ _*) => MergeStrategy.last
  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
  case "about.html" => MergeStrategy.rename
  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
  case "META-INF/mailcap" => MergeStrategy.last
  case "META-INF/mimetypes.default" => MergeStrategy.last
  case PathList("META-INF", xs @ _*) =>
    xs map {_.toLowerCase} match {
      case "manifest.mf" :: Nil | "index.list" :: Nil | "dependencies" :: Nil =>
        MergeStrategy.discard
      case ps @ x :: xs if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
        MergeStrategy.discard
      case "plexus" :: xs =>
        MergeStrategy.discard
      case "services" :: xs =>
        MergeStrategy.filterDistinctLines
      case "spring.schemas" :: Nil | "spring.handlers" :: Nil =>
        MergeStrategy.filterDistinctLines
      case _ => MergeStrategy.first
    }
  case "application.conf" => MergeStrategy.concat
  case "reference.conf" => MergeStrategy.concat
  case "plugin.properties" => MergeStrategy.last
  case "log4j.properties" => MergeStrategy.last
  case _ => MergeStrategy.first
//  case x =>
//    val oldStrategy = (assemblyMergeStrategy in assembly).value
//    oldStrategy(x)
}

cluster mode's output:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/07/17 17:35:40 INFO SecurityManager: Changing view acls to: root
21/07/17 17:35:40 INFO SecurityManager: Changing modify acls to: root
21/07/17 17:35:40 INFO SecurityManager: Changing view acls groups to:
21/07/17 17:35:40 INFO SecurityManager: Changing modify acls groups to:
21/07/17 17:35:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/07/17 17:35:40 INFO Utils: Successfully started service 'driverClient' on port 34218.
21/07/17 17:35:40 INFO TransportClientFactory: Successfully created connection to foo.bar/10.0.8.137:7077 after 39 ms (0 ms spent in bootstraps)
21/07/17 17:35:41 INFO ClientEndpoint: Driver successfully submitted as driver-20210717173541-0003
21/07/17 17:35:41 INFO ClientEndpoint: ... waiting before polling master for driver state
21/07/17 17:35:46 INFO ClientEndpoint: ... polling master for driver state
21/07/17 17:35:46 INFO ClientEndpoint: State of driver-20210717173541-0003 is FAILED
21/07/17 17:35:46 INFO ShutdownHookManager: Shutdown hook called
21/07/17 17:35:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-c15b4457-664f-43b7-9699-b62839ec83c0
  • Cluster mode gives the same output even if I give a random --class name. But client mode just emits the error.
  • submitting in Cluster mode adds a "Completed Driver" with a "FAILED" state in Master's UI (8080 port).
  • I can run my program on local mode successfully.
  • No master and worker log output in client mode.
  • in cluster mode. worker log informs about copying jar and failure of the driver.
  • in cluster mode, the master log gives this output:
21/07/17 17:45:08 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
21/07/17 17:45:08 INFO Master: Launching driver driver-20210717174508-0007 on worker worker-20210716205255-10.0.8.137-45558
21/07/17 17:45:12 INFO Master: Removing driver: driver-20210717174508-0007
21/07/17 17:45:14 INFO Master: 10.0.8.137:34806 got disassociated, removing it.
21/07/17 17:45:14 INFO Master: 10.0.8.137:38394 got disassociated, removing it.

I feel the same thing is happening in both deploy-modes. The error in client is visible through spark-submit, but the error in cluster mode is hidden and must be checked from Master UI (I don't why it is not visible in Master and Worker log files.

UPDATE: as Luis said it was a problem with Scala incompatibility, where my Spark cluster was using an embedded Scala 2.11 and not 2.12. So fixed it by downgrading my fat jar Scala version to 2.11.

MalekMFS
  • 180
  • 2
  • 13
  • You didn't show that your libraries are using `2.11` also are your sure your cluster is using `2.11` maybe the cluster is using `2.12` - BTW, are you sure you are also using the same **Spark** versions? - Finally, did you excluded the **Spark** libraries and the **Scala** stdlib jars from the uber jar? – Luis Miguel Mejía Suárez Jul 17 '21 at 18:16
  • I am using Scala 2.12 and not 2.11. The Server and the build Spark version are 2.4.8 both. You can check my `build.sbt` above. And I don't think if I excluded anything, just handled the "deduplication" issue by copy/pasting and don't know if this mergeStrategy is okay. – MalekMFS Jul 17 '21 at 19:59
  • You still don't show your **Scala** version in your `build.sbt` also, you haven't confirmed what is the **Scala** and **Spark** versions of your cluster. Also, that merge strategy feels very complex and still you haven't done basic configuration as shown here: https://stackoverflow.com/questions/52371961/spark-build-sbt-file-versioning/52375099#52375099 – Luis Miguel Mejía Suárez Jul 17 '21 at 20:16
  • Thanks. I added above the Scala version from my `build.sbt`. Also, I checked the `scala -version` from my cluster, but it was `2.11`, So I downloaded a `2.12.12` and installed, Restarted the Master and the Slave, Yet the same Logs... – MalekMFS Jul 17 '21 at 20:54
  • 1
    No matter what `scala - version` says, **Spark** doesn't use the system **Scala**, it uses its own embedded **Scala**, you need to check that using `spark-shell` – Luis Miguel Mejía Suárez Jul 17 '21 at 21:28
  • spark-shell is showing `2.11` unfortunately, while 2.4.8 is the latest Spark 2 release. I have the "spark-2.4.8-bin-hadoop2.7.tg" from https://archive.apache.org/dist/spark/spark-2.4.8/ But I wonder if I could replace it with "spark-2.4.8-bin-without-hadoop-scala-2.12.tgz" ? – MalekMFS Jul 17 '21 at 22:07
  • 1
    If you want to use **Scala** `2.12` then you need to download and install **Spark** binaries that were built against that version. Otherwise, just change your **Scala** version in your project and produce a new fat jar. – Luis Miguel Mejía Suárez Jul 17 '21 at 22:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/235014/discussion-between-malekmfs-and-luis-miguel-mejia-suarez). – MalekMFS Jul 18 '21 at 09:40

0 Answers0