1

I am running this command to read data from Azure databricks from a plain cluster (hadoop not installed).

spark-submit --packages io.delta:delta-core_2.12:0.7.0 \
  --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
  --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
  --conf "spark.delta.logStore.class=org.apache.spark.sql.delta.storage.HDFSLogStore" \
  Test_write_to_DL.py

I am getting this error

: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2595)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3269)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
    at org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:163)
    at org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:259)

Can you please suggest what jar I need to install in order to get this working

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Srinivas
  • 2,010
  • 7
  • 26
  • 51

1 Answers1

0

See Delta documentation for details:

  1. First, instead of org.apache.spark.sql.delta.storage.HDFSLogStore, you need to use org.apache.spark.sql.delta.storage.AzureLogStore
  2. You need to include hadoop-azure package (maven coordinates) into --packages
  3. You also need to provide credentials, etc.
Alex Ott
  • 80,552
  • 8
  • 87
  • 132