3

I'm facing a weird issue when trying to run my scala spark app with spark-submit (it's working fine when doing sbt run). All of this is ran locally.

I have a standard sparkSession declaration:

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .appName("EPGSubtitleTimeSeries")
    .getOrCreate()

but when trying to run it through spark-submit as follow:

./bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.3 --master local[2] --class com.package.EPGSubtitleTimeSeries --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem /home/jay/project/tv-data-pipeline/target/scala-2.12/epg-subtitles_2.12-0.1.jar

I got this error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at com.project.Environment$.<init>(EPGSubtitleTimeSeries.scala:55)
    at com.project.Environment$.<clinit>(EPGSubtitleTimeSeries.scala)
    at com.project.EPGSubtitleJoined$.$anonfun$start_incremental_load$1(EPGSubtitleTimeSeries.scala:409)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.Set$Set3.foreach(Set.scala:163)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
    at scala.collection.SetLike$class.map(SetLike.scala:92)
    at scala.collection.AbstractSet.map(Set.scala:47)
    at com.package.EPGSubtitleJoined$.start_incremental_load(EPGSubtitleTimeSeries.scala:408)
    at com.package.EPGSubtitleTimeSeries$.main(EPGSubtitleTimeSeries.scala:506)
    at com.package.EPGSubtitleTimeSeries.main(EPGSubtitleTimeSeries.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Which I have narrow down with a few print to make sure that it's actually this line producing it:

val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

From:

val EPG_SCHEDULE_OUTPUT_COLUMNS = Array(
    "program_title",
    "epg_titles",
    "series_title",
    "season_title",
    "date_time",
    "duration",
    "short",
    "medium",
    "long",
    "start_timestamp",
    "end_timestamp",
    "epg_year_month",
    "epg_day_of_month",
    "epg_hour_of_day",
    "epg_genre",
    "channelId"
  )

  val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

I'm using spark 2.4.4 and scala 2.12.8 as well as joda-time 2.10.1 (no other dependencies on my build.sbt)

Does anyone has an idea of what the error is?

Jay Cee
  • 1,855
  • 5
  • 28
  • 48
  • Are you sure that you are using the same versions for **Spark** & **Scala** on compile and runtime? – Luis Miguel Mejía Suárez Sep 25 '19 at 14:56
  • I'm doing all of this through command line, how can I make sure of this? – Jay Cee Sep 25 '19 at 14:57
  • How was the cluster created? If on premise ask your system administrator which versions do they use. If on **AWS EMR**, check the service version and look the documentation which version of the packages they provide, etc. Also, if you have access to the cluster where the app is running, open an `spark-shell` it will print the **Spark** & **Scala** versions. You should always use the same exact versions. – Luis Miguel Mejía Suárez Sep 25 '19 at 15:00
  • @LuisMiguelMejíaSuárez ah, I should have precised, before running it on AWS I'm trying to do it locally at the moment, still with spark-submit – Jay Cee Sep 25 '19 at 15:03
  • 1
    and are your sure that the version that you installed locally is the same that you used for compiling? – Luis Miguel Mejía Suárez Sep 25 '19 at 15:10
  • ooh you're right, when I typed `scala` it prints 2.11 *face_palm" – Jay Cee Sep 25 '19 at 15:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199968/discussion-between-luis-miguel-mejia-suarez-and-jay-cee). – Luis Miguel Mejía Suárez Sep 25 '19 at 15:11
  • https://stackoverflow.com/questions/75947449/run-a-scala-code-jar-appear-nosuchmethoderrorscala-predef-refarrayops – Dmytro Mitin Apr 07 '23 at 04:50

1 Answers1

2

Following my conversation with Luis it appears that I compiled with scala 2.12 while spark was running on scala 2.11

I first wanted to upgrade to spark 2.4.4 (which would allow me to use 2.12 I think?) but the main problem is that aws-emr (which is my final goal) doesn't support scala 2.12: https://forums.aws.amazon.com/thread.jspa?messageID=902385&tstart=0

So the final solution for it was to downgrade my scala version to 2.11 at compilation.

Thanks a lot Luis for your guidance and knowledge!

Jay Cee
  • 1,855
  • 5
  • 28
  • 48