We are using delta table for state tracking and we have around 56 partitions. The thing is when we are inserting/updating the data to the delta table it is being updated in the partitions but maybe not the delta logs. Because when we try to read the data for the inserted file again it says not found but we can see a parquet file in that partition.
Is there some reason for this?
Our code:
DeltaTable.createIfNotExists(spark)
.tableName("trackingTable2")
.addColumn("tableName", "STRING")
.addColumn("path", "STRING")
.partitionedBy("tableName")
.location("/tmp/fr")
.execute()
var delta = DeltaTable.forPath(spark,"/tmp/fr")
var data = List(("xxx","yyy"))
var new_df = data.toDF("tableName","path")
var table_name = "xxx"
delta.as("stateTracking")
.merge(new_df.as("updates"),s"stateTracking.tableName = updates.tableName and tableName = '$table_name'")
.whenMatched
.updateExpr(Map("path" -> "updates.path"))
.whenNotMatched
.insertExpr(Map("tableName" -> "updates.table_name","path"->"updates.path"))
.execute()
We tried to check the status and repopulate the table again until we see it, but is there any proper reason why this is happening?