We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a specific version number is specified.
For example (returns no records)
df_nw = spark.read.format('delta').load('/mnt/xxxx')
display(df_nw.filter("testcolumn = ???"))
But this does
%sql
SELECT * FROM delta.`/mnt/xxxx` VERSION AS OF 472 where testcolumn = ???
As mentioned above this only seems to be effecting newly inserted records. Has anyone else come across this before?
Any help would be appreciated.
Thanks Col