Unable to read Databricks Delta / Parquet File with Delta Format

Question

I am trying to read a delta / parquet in Databricks using the follow code in Databricks

df3 = spark.read.format("delta").load('/mnt/lake/CUR/CURATED/origination/company/opportunities_final/curorigination.presentation.parquet')

However, I'm getting the following error:

A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: curorigination.presentation.parquet

This seemed very straightforward, but not sure why I'm getting the error

Any thoughts?

The file structure looks like the following

score 1 · Accepted Answer · edited Jul 14 '23 at 08:45

1

The error shows that delta lake thinks that you have wrong partition path naming.

If you have any partition column in your delta table, for example year month day, your path should look like

/mnt/lake/CUR/CURATED/origination/company/opportunities_final/year=yyyy/month=mm/day=dd/curorigination.presentation.parquet

and, you just need to do

df = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

If you just read it as parquet, you can just do

df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

because you don't need to read the absolute path of the parquet file.

edited Jul 14 '23 at 08:45

pugmarx

7,323
3
30
40

answered Aug 06 '22 at 12:40

Jonathan Lam

1,761
2
8
17

Any other thoughts? – Patterson Aug 06 '22 at 18:04
Hi Jonathon any other suggestions. As far as I can I don't have any partition columns in the delta table – Patterson Aug 07 '22 at 17:50
Hi Jonathon, I just realised I was being silly... I now understand that ```df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")``` is all I need. – Patterson Aug 07 '22 at 19:38
Hi @Patterson, glad to hear that your problem is solved. May I know more about why do you save different format (`delta` and `parquet`) in the same directory (although they're both parquet format)? – Jonathan Lam Aug 08 '22 at 07:11

B. B. Naga Sai Vamsi · Answer 2 · 2022-08-06T19:30:32.620

The above error mainly happens because of incorrect path format curorigination.presentation.parquet. please check your delta location and also check whether delta file is created or not :

%fs ls /mnt/lake/CUR/CURATED/origination/company/opportunities_final/

I reproduced the same thing in my environment. First of all, I created a data frame with a parquet file.

df1 = spark.read.format("parquet").load("/FileStore/tables/")
display(df1)

After that I just converted the parquet file into delta format and saved the file into this location/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1 .

df1.coalesce(1).write.format('delta').mode("overwrite").save("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1")

#Reading delta file
df3 = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta")
display(df3)

Hi Bhanunagasai, thanks for reaching out. I've upated the question to show the exact file structure. Just so you know the file is a delta file. In the meantime, I will review your answer — Patterson, Aug 07 '22 at 17:42
So, BhanunagasaiVamsi, have reviewed your answer, however because you may have thought that I was working with a Parquet file your suggestion doesn relate. This is because, the file curorigination.presentation.parquet is a delta file. — Patterson, Aug 07 '22 at 17:51

Unable to read Databricks Delta / Parquet File with Delta Format

2 Answers2