I am attempting to use the update/Delete/Upsert operation in Pyspark with AWS Glue.
I have instantiated spark with below configs:
spark = SparkSession.builder.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension").config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog").getOrCreate()
If I skip the update/Delete/Upsert operation, the merge, insert (which I assume also requires the DeltaSparkSessionExtension) works just fine. This makes no sense, why does the update operation throw this error but the merge operation does not?
I have tried to perform update and delete using direct transformation and also via spark-sql.
With direct transformation, i am facing the issue:
this delta operation required sparksession to be configured with glue
Note: I have configured spark session with all the required Delta dependencies.
With Spark-sql, I am using the following query:
MERGE INTO delta.`s3a://delta-lake-aws-glue-demo/current/` as superstore
USING delta.`s3a://delta-lake-aws-glue-demo/updates_delta/` as updates
ON superstore.row_id = updates.row_id
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED
THEN INSERT *
I am facing the below issue for the above query:
AnalysisException: Table does not support reads: delta.`s3a://delta-lake-aws-glue-demo/current/`
Tested out with following jars and the results are:
delta-core_2.11-0.6.1.jar -- deprecated jar
delta-core_2.12-0.8.0.jar -- jar which supports inserts and append.
delta-core_2.12-2.1.0.jar -- An error occurred while calling o103.save. java.lang.NoClassDefFoundError: org/apache/spark/SparkThrowable
delta-core_2.12-1.0.0.jar -- jar which supports inserts and append.
this delta operation required sparksession to be configured with glue
Any help is appreciated. Thanks :)