1

Im trying to use delta lake on top of EMR and I see an error whenever I try to run "restoreToVersion"

Using:
delta-storage-1.2.1.jar
delta-core_2.12-1.2.1.jar
emr-6.6.0
Hadoop distribution:Amazon 3.2.1

import io.delta.tables._
import io.delta.implicits._

val columns = Seq("language","users_count")
val data = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000"))
val rdd = spark.sparkContext.parallelize(data)
val dfFromRDD1 = rdd.toDF()
val dfFromRDD1 = rdd.toDF("language","users_count")

dfFromRDD1.write.mode("append").delta("/tmp/dummy7")

val deltaTable = DeltaTable.forPath(spark, "/tmp/dummy8/")
val fullHistoryDF = deltaTable.history()
fullHistoryDF.show()


dfFromRDD1.write.mode("append").delta("/tmp/dummy8/")
val fullHistoryDF = deltaTable.history()

fullHistoryDF.show() enter image description here

deltaTable.restoreToVersion(0) enter image description here

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
SomeDataFellow
  • 90
  • 1
  • 1
  • 8
  • So I missed to add the package in the spark-shell command. https://stackoverflow.com/questions/59512366/how-to-use-delta-lake-with-spark-shell – SomeDataFellow Jul 13 '22 at 10:16
  • any solution you find for this , even i have similar kind of issue https://stackoverflow.com/questions/72963925/org-apache-spark-sql-analysisexception-unresolved-operator-project – Shasu Jul 13 '22 at 11:03
  • @Shasu you may follow below link for the resolution:https://stackoverflow.com/questions/59512366/how-to-use-delta-lake-with-spark-shell – SomeDataFellow Jul 18 '22 at 11:15
  • , any clue on this issue https://stackoverflow.com/questions/74035832/exception-occured-while-writing-delta-format-in-aws-s3 ? – Shasu Oct 12 '22 at 02:28
  • I had same error in a unit test . Solved this by creating clean SparkSession for this iest. So, the problem seems to be that some context in the SparkSession created by previous tests prevented the DeltaTable from functioning normally, and `.getOrCreate()` method returned SparkSession from previous tests. So, a good way to use SparkSession in tests is via try-with-resources block. – Ivan Veselovsky Mar 29 '23 at 15:45

1 Answers1

0

Had the same issue, later I discovered that I had some issue while creating my spark session, here is the code that I used to resolve the issue:

builder = SparkSession.builder.appName(app_name) \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.master(master)


spark = configure_spark_with_delta_pip(builder).getOrCreate()

Initially I was not using .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") in my code. Using this in my spark session solved the issue.