0

I've added the delta dependencies in my build.sbt

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-hive" % sparkVersion,
  // logging
  "org.apache.logging.log4j" % "log4j-api" % "2.4.1",
  "org.apache.logging.log4j" % "log4j-core" % "2.4.1",
  // postgres for DB connectivity
  "org.postgresql" % "postgresql" % postgresVersion,
  "io.delta" %% "delta-core" % "0.7.0"

However, I cannot figure out what configuration must the spark session contain. The code below fails.

val spark = SparkSession.builder()
    .appName("Spark SQL Practice")
    .config("spark.master", "local")
    .config("spark.network.timeout"  , "10000000s")//to avoid Heartbeat exception
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .getOrCreate()

Exception -

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable
Powers
  • 18,150
  • 10
  • 103
  • 108
Animesh
  • 176
  • 1
  • 1
  • 10

3 Answers3

3

Here's an example project I made that'll help you.

The build.sbt file should include these dependencies:

libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.0" % "provided"
libraryDependencies += "io.delta" %% "delta-core" % "0.7.0" % "provided"

I think you need to be using Spark 3 for Delta Lake 0.7.0.

You shouldn't need any special SparkSession config options, something like this should be fine:

lazy val spark: SparkSession = {
  SparkSession
    .builder()
    .master("local")
    .appName("spark session")
    .config("spark.databricks.delta.retentionDurationCheck.enabled", "false")
    .getOrCreate()
}
10 Rep
  • 2,217
  • 7
  • 19
  • 33
Powers
  • 18,150
  • 10
  • 103
  • 108
  • 1
    Had been using 3.0-preview version until now, which caused the issue in the first place! Also, I did notice that including 'provided' caused a JNI error. It did work without it though. – Animesh Jul 11 '20 at 15:02
1

This is caused when there is a class file that your code depends on and it is present at compile time but not found at runtime. Look for differences in your build time and runtime classpaths.

More specific to your scenario:

If you get  java.lang.NoClassDefFoundError on
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable exception 
in this case JAR version does not have MergeIntoTable.scala file. 
The solution was to add the apache spark latest version, which comes with the
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable.scala file . 

More info in the spark 3.x.x upgrage & release - https://github.com/apache/spark/pull/26167.

sathya
  • 1,982
  • 1
  • 20
  • 37
0

You need to upgrade Apache Spark. MergeIntoTable feature was introduced in version v3.0.0. Link to sources: AstBuilder.scala, Analyzer.scala, Github Pull Request, Release Notes (Look into Feature Enhancements section).

Mrinal Roy
  • 969
  • 7
  • 12