I have a question about delta tables \ delta change data feed \ upsert
I have a table that stores all history of states (main), I use ZORDER by uniq column on it, and I retrieve results for 2s with WHERE clause BY (uniq, date) from the table.
During the day I receive many messages from the stream and add them (via append) into the delta table (main), each night I recalculate ZORDER for the main table
β
I would like to create another table that keeps only the latest records by a unique column in the table. I need this table for decreasing the latency of retrieving data.
Possible solutions:
Solution with DENSE_RANK by date (or QUALIFY), we grab from (main) only fresh uniq ΠΈ date.
Each time we recreate the table, in that case, I lose a lot of CPU time and lose ZORDER on the table, in a result this new small table works worse than the bigger (main) with ZORDER.Upsert (merge)
I tried but looks like it doesn't work as I want it to. Hardly to say, it appends or deletes the data from the batch, not from the (main) table, and that records that haven't been matched are not deleted.Delta Change Data Feed
For this approach, I need to store the full state in a batch, but currently batch is just a new portion of the data.I start thinking about that for the latest data I need to store it somewhere else (in mem KV storage, Redis for example)
Partition by uniq
works worse than ZORDER by uniq column. So maybe I need to drop uniq partition and add a new uniq from the batch?
Structure of the (main):
date | uniq | key | value
2022-12-13 | 1 | "key1" | 1
2022-12-13 | 1 | "key55" | 5
2022-12-13 | 2 | "key105" | 3
Mini-batches like this one:
date | uniq | key | value
2022-12-14 | 1 | "key3" | 0
2022-12-14 | 1 | "key4" | 1
2022-12-14 | 1 | "key5" | 0
2022-12-14 | 1 | "key6" | 1
Right now after insertion (batch) to (main), it will be:
date | uniq | key | value
2022-12-13 | 1 | "key1" | 1
2022-12-13 | 1 | "key55" | 5
2022-12-13 | 2 | "key105" | 3
2022-12-14 | 1 | "key3" | 0
2022-12-14 | 1 | "key4" | 1
2022-12-14 | 1 | "key5" | 0
2022-12-14 | 1 | "key6" | 1
I want to have a such table (latest):
date | uniq | key | value
2022-12-13 | 2 | "key105" | 3
2022-12-14 | 1 | "key3" | 0
2022-12-14 | 1 | "key4" | 1
2022-12-14 | 1 | "key5" | 0
2022-12-14 | 1 | "key6" | 1
ETL:
raw -> AutoLoader -> silver (with append every batch, night OPTIMIZE ZORDER)
Want to create another table (gold) that will keep only the latest records by uniq column.
If we add a new batch (new state), I expect to see only these new values per uniq in the gold table. So with every new batch, it should be triggered and recalculated somehow.
Or I should forget about the idea, and instead of that, save all the data in in-mem KV? (Redis)