0

I am coming here as i'm encoutering some strange issues with blob storage (and delta table, but i believe issues come from blob storage).

We are encoutering this classical delta lake error, : when somebody manually delete some files from the storage explorer, instead of using a delete statement. The files were deleted (or removed, moved?) between 1am and 2am on 3 different tables. We know it because the process using the table ran fine at 1am and failed at 2am.

Caused by: com.[REDACTED].sql.io.FileReadException: Error while reading file

dbfs:/mnt/source-be/myfolder/folder/ingestdatetime=20211114011927/part-00000-ea38c232-4dd2-4e49-9847-17e5f5c1222d.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table DELETE statement

We don't know what caused this to disapear. We have checked the storage logs and i cannot see any "delete file" tag for any of these files.

df_logs.where((df_logs._c12.contains('20211114011927'))

Only "GetPath", "GetFileProperties", and "Readfile". Is it possible that some files were moved or deleted without this appearing in the logs ?

I am very puzzled. The file did disappear from blob storage. Hence i don't think databricks is the suspect.


edit

Actually the files did not get deleted , but the partition got renamed to 20211113011927 instead of 20211114011927. I have no idea how this is possible. We are reviewing our code to try to understand how this could happen.

OrganicMustard
  • 1,158
  • 1
  • 15
  • 36
  • 1
    how this storage account is mounted? using `wasbs` or `abfss` ? you can run `display(dbutils.fs.mounts())` to get this information – Alex Ott Nov 15 '21 at 18:36
  • Thanks for your answer . I believe i mounted it with abfss. The file did really disapear from storage account . So i'm not sure it is a problem with mounting. – OrganicMustard Nov 15 '21 at 18:46
  • 1
    I've seen file corruption in some cases when ADLSGen2 was mounted with `wasbs`, not with `abfss` – Alex Ott Nov 15 '21 at 18:54
  • @GuilLabs did you come to a resolution? I'm running into similar behavior now. – reallyJim Aug 23 '22 at 12:09

1 Answers1

1

The issue you have mentioned is really very odd. Even if the blob deleted it should be in the logs (expecting you were getting logs in past for similar activities). And as @Alex mentioned in the comment that when we mount ADLS account using abfss, there is no issue of any file corruption.

Since, there could be any possible reason for such odd behavior, my suggestion is to post your question on Microsoft Q&A where directly Microsoft engineer will reach you and solve your issue. There are high chances that you will get your files back when reported timely with their help.

Utkarsh Pal
  • 4,079
  • 1
  • 5
  • 14