3

Spark Version: 3.2.1 Delta version: 1.2.1 (tried 2.0 version as well)

While I am trying to run the getting started code to try out "delta".

from pyspark.sql import SparkSession
from delta import *
builder = SparkSession.builder.appName("MyApp") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")

I am getting below error: "name": "Py4JJavaError", "message": "An error occurred while calling o201.showString.\n: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'spark_catalog'

Can anyone please help me understand the issue to resolve it? Thanks in Advance.

Mohan
  • 61
  • 4

1 Answers1

2

Not sure which environment and mode are you using, but in general you need to add your jar by using the config spark.jars.packages because delta lake jar is not in Spark default jar. For example .config("spark.jars.packages", "io.delta:delta-core_2.12:1.2.0")

Jonathan Lam
  • 1,761
  • 2
  • 8
  • 17
  • any clue on this issue https://stackoverflow.com/questions/74035832/exception-occured-while-writing-delta-format-in-aws-s3 ? – Shasu Oct 12 '22 at 02:25
  • The line `spark = configure_spark_with_delta_pip(builder).getOrCreate()` takes care of adding that particular config with (presumably) the correct maven coordinates before creating the Spark session. It it standard according to https://docs.delta.io/latest/quick-start.html#python – Bjarne Thorsted Jan 16 '23 at 12:03