0

I have a writeStream that writes data from into a delta table. It filters out all data that are beyond five minute and only keep those under. But as time goes on, the ones that were input are now beyond five minute, so I need to have a way to delete them?

Other than using vaccum and setting a schedule that runs it periodically, is there any other way to achieve a table that "only keep data that are within five minute, as soon as the data is beyond five minutes, it drops it"

Trodenn
  • 37
  • 5

1 Answers1

0

I don't think Delta yet supports this functionality but we can mimic the functionality by using views to get only the latest data from the table.

Later periodically delete the records from the table and do vaccum on the table.

Example:

select * from (
SELECT data.a, b
    FROM VALUES ('one', current_Timestamp()),
                ('two', timestamp('2023-03-09 04:18:00')),
                ('three', timestamp('2023-03-09 04:13:00')) AS data(a, b)) where b >= now() - interval 5 minute

Result:

a       b
one 2023-03-09T04:23:49.549+0000

Same way as shown above you can create a view on top of your delta table with the filter now() - interval 5 minute this filter only selects the data for last 5 minutes.

create view <view_name> as select * from <delta_table> 
where <timestamp_col> >= now() - interval 5 minute 
notNull
  • 30,258
  • 4
  • 35
  • 50