0

I am reading an avro file from Azure data lake using databricks and I am using this path to read current date file for daily run, the code to drive the file date looks like this and it gets the current date fine.

    val pfdtm = ZonedDateTime.now(ZoneOffset.UTC)   
        val fileDate =DateTimeFormatter.ofPattern("yyyy_MM_dd").format(pfdtm)

fileDate=2020-02-02

but when I use the fileDate variable in the path, it does not work, it raises path does not exist error. you can see the below path

val df=spark.read.format("com.databricks.spark.avro").load("adl://power.azuredatalakestore.net/SD_Case/eventhubspace/venthub/0_${fileDate}_*_*_*.avro")

but when I use the actual date instead of the variable, it works fine

val df=spark.read.format("com.databricks.spark.avro").load("adl://powerbiconnect.azuredatalakestore.net/SD_Case/sdeventhubspace/sdeventhub/0_2020_02_02_*_*_*.avro")

the actual folder path looks like this, with a sample daily file form day 2.

adl://power.azuredatalakestore.net/SD_Case/eventhubspace/venthub/0_2020_02_02_10_11_15.avro

I will appreciate any help on correcting my code. thanks in advance

HaiY
  • 145
  • 1
  • 5
  • 15
  • `0_2020_02_02_*_*_*.avro` is this file name?? – venus Feb 06 '20 at 06:51
  • yes, it is the file name, but * indicates a variable for minutes and seconds. – HaiY Feb 07 '20 at 01:05
  • I really don't find any error with the code, the problem is- name of the file.. when you are providing the hard coded file name then is is able to identify the file in the directory but when you are trying to get the file name based on a time stamp then it looking for the file name with that particular and as per my assumption you are pushing the file name with different time stamp hence both values does not matching so it is not able to pick the file. So just try to get the file with a consistent name. – venus Feb 07 '20 at 19:05

0 Answers0