0

I am using Spark with scala, I am also using aws glue libraries as well for glue script. When i am using scala version 2.12 I am getting this error.

error with version 2.12

import com.amazonaws.services.glue.{DataSource, DynamicFrame, GlueContext}
import com.amazonaws.services.glue.util.{GlueArgParser, Job, JsonOptions}
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.JavaConverters._

object Test {
  def main(systemArgs: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("GlueExample").setMaster("local")
    val sc = new SparkContext(conf)
    sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    val gc: GlueContext = new GlueContext(sc)
    val connectionOptions = JsonOptions(Map(
      "paths" ->  Seq("s3://bucket_path"),
      "groupFiles" -> "inPartition"
    ))
    val source: DataSource = gc.getSourceWithFormat(
      connectionType = "s3",
      options = connectionOptions,
      transformationContext = "",
      format = "parquet",
      formatOptions = JsonOptions.empty
    )
  }
}

when i changed the scala version to 2.11 after going through so many similar issues, i am getting this error. error with 2.11 version

it's not even starting SparkConf()

My build.gradle file.

plugins {
    id 'scala'
    id 'java'
    id 'application'
}

repositories {
    maven { url 'https://repo1.maven.org/maven2/' }
    mavenCentral()
    maven { url 'https://aws-glue-etl-artifacts.s3.amazonaws.com/release/' }
}
dependencies {
    implementation project(':diff-lib')

    implementation 'org.scala-lang:scala-library'
    implementation 'com.google.guava:guava'

    implementation 'software.amazon.awssdk:glue'
    implementation 'com.amazonaws:AWSGlueETL:1.0.0'
    implementation "org.apache.spark:spark-core_$scalaVersion"
    implementation 'org.slf4j:slf4j-log4j12
}

geadle.properties file

gradleVersion=6.7
lombokVersion=1.18.10
awaitilityVersion=3.1.6
javaVersion=8
projectVersion=1.0.0
awsSdkVersion=2.16.44
junitVersion=5.7.1
log4jVersion=2.14.1
scalaVersion=2.12
scalaLibVersion=2.12.12
sparkVersion=2.4.3
glueEtlVersion=1.0.0
guavaLibVersion=29.0-jre
scalaTestVersion=3.2.0
scalaTestPlusVersion=3.2.0.0
scalaXmlVersion=1.2.0
slf4jLog4j12Version=1.7.10

build.gradle for diff-lib library

plugins {
    id 'scala'
    id 'java-library'
}

repositories {
    maven { url 'https://repo1.maven.org/maven2/' }
    mavenCentral()
    maven { url 'https://aws-glue-etl-artifacts.s3.amazonaws.com/release/' }
}

dependencies {
    implementation 'org.scala-lang:scala-library'
    implementation 'com.amazonaws:AWSGlueETL'
    implementation 'com.google.guava:guava'
}
  • 1
    I'm not familiar with Glue but you should probably look to check that all your dependencies have the same Scala version (2.11 or 2.12). I can see two different values in your Gradle definition `scalaVersion=2.11 scalaLibVersion=2.12.12`, this is smelly although I don't know much Gradle either. – Gaël J May 12 '21 at 12:34
  • @GaëlJ I have used scalaVersion=2.12 and scalaLibVersion=2.12.0, still getting the same error. – Abhishek Kumar May 12 '21 at 13:01

1 Answers1

0

The Glue release notes point to 2.11 being needed for the Scala version (because Spark 2.4.3 uses Scala 11 by default). Once you are using a Scala version for one library, it tends to be necessary to ensure all other libraries have a matching version.

Your build.gradle file seems to be lacking version references (or references to the variables which define the versions, in the properties file). Please see this example, which has explicit version numbers (but you can also use dollar-sign variables which you have defined in your properties file).

As one commenter has noted, scalaLibVersion and scalaVersion in your properties file do not match. Ensure that they match, and that none of the dependencies are using another Scala version. Also, try using explicit versions in your main gradle dependency file.

ELinda
  • 2,658
  • 1
  • 10
  • 9
  • I have used scala version 2.12 at all places. I just showed two different errors with both versions. I think reason could be glue requires 2.11 scala version. but the same jar works perfect on Aws glue. reason could be I am using glue 2.0 there. Trying to figure out. – Abhishek Kumar May 13 '21 at 16:38