I had a delta table in databricks and data is available in ADLS. data is partitioned by date column, from 01-06-2022 onwards data is available in parquet format in adls but when i query the table in databricks i can see data from future date onwards everyday.older data is not displaying. every day data is overwriting to the table path with partitioned date column.
Asked
Active
Viewed 258 times
0

David Browne - Microsoft
- 80,331
- 6
- 39
- 67

Vamsi Krishna
- 51
- 1
- 1
- 4
-
Can you share the code snippet what you are using to write the data to ADLS. – SomeDataFellow Aug 18 '22 at 15:57
-
df.write.format('delta').mode('overwrite').save('{}/{}'.format(DELTALAKE_PATH, table)) – Vamsi Krishna Aug 19 '22 at 11:52
-
I can see the data is in adls ,,if i try reading the files i can only read future dates .. when i try reading older date data i'm getting path doesnot exits error – Vamsi Krishna Aug 19 '22 at 11:54
-
I dont see the partition by column defined in the above query you have pasted. – SomeDataFellow Aug 19 '22 at 16:11
-
table is created with partitioned date column – Vamsi Krishna Aug 19 '22 at 17:54
1 Answers
1
df.write.format('delta').mode('overwrite').save('{}/{}'.format(DELTALAKE_PATH, table))
Using Overwrite mode will delete past data and add new data. This is the reason for your issue.
df.write.format('delta').mode('append').save('{}/{}'.format(DELTALAKE_PATH, table))
Using append mode will append new data beneath the existing data. This will keep your existing data and when you execute a query, it will return past records as well.
You need to use append mode in place of overwrite mode.
Append Mode - Only the new rows appended in the Result Table since the last trigger will be written to the external storage. This is applicable only to queries where existing rows in the Result Table are not expected to change.
Reference - https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts

Abhishek K
- 3,047
- 1
- 6
- 19
-
it's much simpler to use `f'{DELTALAKE_PATH}/{table}'` instead of`'{}/{}'.format(DELTALAKE_PATH, table)` – Alex Ott Aug 24 '22 at 10:17