0

I am trying to read a delta / parquet in Databricks using the follow code in Databricks

df3 = spark.read.format("delta").load('/mnt/lake/CUR/CURATED/origination/company/opportunities_final/curorigination.presentation.parquet')

However, I'm getting the following error:

A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: curorigination.presentation.parquet

This seemed very straightforward, but not sure why I'm getting the error

Any thoughts?

The file structure looks like the following enter image description here

Patterson
  • 1,927
  • 1
  • 19
  • 56

2 Answers2

1

The error shows that delta lake thinks that you have wrong partition path naming.

If you have any partition column in your delta table, for example year month day, your path should look like

/mnt/lake/CUR/CURATED/origination/company/opportunities_final/year=yyyy/month=mm/day=dd/curorigination.presentation.parquet

and, you just need to do

df = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

If you just read it as parquet, you can just do

df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

because you don't need to read the absolute path of the parquet file.

pugmarx
  • 7,323
  • 3
  • 30
  • 40
Jonathan Lam
  • 1,761
  • 2
  • 8
  • 17
  • Any other thoughts? – Patterson Aug 06 '22 at 18:04
  • Hi Jonathon any other suggestions. As far as I can I don't have any partition columns in the delta table – Patterson Aug 07 '22 at 17:50
  • Hi Jonathon, I just realised I was being silly... I now understand that ```df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")``` is all I need. – Patterson Aug 07 '22 at 19:38
  • Hi @Patterson, glad to hear that your problem is solved. May I know more about why do you save different format (`delta` and `parquet`) in the same directory (although they're both parquet format)? – Jonathan Lam Aug 08 '22 at 07:11
0

The above error mainly happens because of incorrect path format curorigination.presentation.parquet. please check your delta location and also check whether delta file is created or not :

%fs ls /mnt/lake/CUR/CURATED/origination/company/opportunities_final/  

I reproduced the same thing in my environment. First of all, I created a data frame with a parquet file.

df1 = spark.read.format("parquet").load("/FileStore/tables/")
display(df1)

Ref1

After that I just converted the parquet file into delta format and saved the file into this location/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1 .

df1.coalesce(1).write.format('delta').mode("overwrite").save("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1")

Ref2

#Reading delta file
df3 = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta")
display(df3)

Ref4

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11
  • Hi Bhanunagasai, thanks for reaching out. I've upated the question to show the exact file structure. Just so you know the file is a delta file. In the meantime, I will review your answer – Patterson Aug 07 '22 at 17:42
  • So, BhanunagasaiVamsi, have reviewed your answer, however because you may have thought that I was working with a Parquet file your suggestion doesn relate. This is because, the file curorigination.presentation.parquet is a delta file. – Patterson Aug 07 '22 at 17:51