0

I am trying to read (and eventually write) from azurite (version 3.18.0) using spark (3.1.1)
i can't understand what spark configurations and file uri i need to set to make this work properly
for example these are the containers and files i have inside azurite

/devstoreaccount1/container1/file1.avro
/devstoreaccount1/container2/file2.avro

This is the code that im running - the uri val is one of the values below

val uri = ...
val spark = SparkSession.builder()
      .appName(appName)
      .master("local")
      .config("spark.driver.host", "127.0.0.1").getOrCreate()

spark.conf.set("spark.hadoop.fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.conf.set(s"spark.hadoop.fs.azure.account.auth.type.devstoreaccount1.blob.core.windows.net", "SharedKey")
spark.conf.set(s"spark.hadoop.fs.azure.account.key.devstoreaccount1.blob.core.windows.net", <azurite account key>)

spark.read.format("avro").load(uri)

uri value - what is the correct one?

  • http://127.0.0.1:10000/container1/file1.avro
    I get UnsupportedOperationException when i perform the spark.read.format("avro").load(uri) because spark will use the HttpFileSystem implementation and it doesn't support listStatus
  • wasb://container1@devstoreaccount1.blob.core.windows.net/file1.avro
    Spark will try to authenticate against azure servers (and will fail for obvious reasons)

I have tried to follow this stackoverflow post without success. I have also tried to remove the blob.core.windows.net configuration postfix but then i don't how to give spark the endpoint for the azurite container?

So my question is what are the correct configurations to give spark so it will be able to read from azurite, and what are the correct file path formats to pass as the URI?

Mr T.
  • 4,278
  • 9
  • 44
  • 61

0 Answers0