I am coming here as i'm encoutering some strange issues with blob storage (and delta table, but i believe issues come from blob storage).
We are encoutering this classical delta lake error, : when somebody manually delete some files from the storage explorer, instead of using a delete statement. The files were deleted (or removed, moved?) between 1am and 2am on 3 different tables. We know it because the process using the table ran fine at 1am and failed at 2am.
Caused by: com.[REDACTED].sql.io.FileReadException: Error while reading file
dbfs:/mnt/source-be/myfolder/folder/ingestdatetime=20211114011927/part-00000-ea38c232-4dd2-4e49-9847-17e5f5c1222d.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table
DELETE
statement
We don't know what caused this to disapear. We have checked the storage logs and i cannot see any "delete file" tag for any of these files.
df_logs.where((df_logs._c12.contains('20211114011927'))
Only "GetPath", "GetFileProperties", and "Readfile". Is it possible that some files were moved or deleted without this appearing in the logs ?
I am very puzzled. The file did disappear from blob storage. Hence i don't think databricks is the suspect.
edit
Actually the files did not get deleted , but the partition got renamed to 20211113011927 instead of 20211114011927. I have no idea how this is possible. We are reviewing our code to try to understand how this could happen.