0

I have an existing dataproc cluster with spark version 3.3 As per the doc https://docs.delta.io/latest/releases.html, Deltalake version 2.3 is compatible with spark 3.3. Hence followed below steps to install deltalake

  • Configuration on Jupyter
Kernel: /opt/conda/miniconda3/bin/python
Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]
PySpark version: 3.4.1
spark version: 3.3.0
  • on master node, executed pip install delta-spark==2.3.0
  • downloaded Deltalake jar to /usr/lib/spark/jars/ using below command
sudo wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.3.0/delta-core_2.12-2.3.0.jar
  • Added below entry in /etc/spark/conf/spark-deafults.conf
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Question: when I try to write a dataframe in delta format on Jupyter notebook emp_details.write.format("delta").mode("overwrite").save(delta_path), running into below.

Error:

Py4JJavaError: An error occurred while calling o90.save.
: com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.delta.storage.DelegatingLogStore$

Tried also to set below param, but running into same error.

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages io.delta:delta-core_2.12:2.3.0 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog pyspark-shell'

Error with above PYSPARK_SUBMIT_ARGS on Jupyter:

Py4JJavaError: An error occurred while calling o84.save.
: com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: io/delta/storage/LogStore
Please ensure that the delta-storage dependency is included.

If using Python, please ensure you call `configure_spark_with_delta_pip` or use
`--packages io.delta:delta-core_<scala-version>:<delta-lake-version>`.
See https://docs.delta.io/latest/quick-start.html#python.

More information about this dependency and how to include it can be found here:
https://docs.delta.io/latest/porting.html#delta-lake-1-1-or-below-to-delta-lake-1-2-or-above.

Update-1: Followed the setup instructions from https://delta.io/learn/getting-started/, but running into same above error.

Update-2: also used delta-storage jar, now running into different error below.

0 Answers0