1

I'm looking for a way to add ttl(time-to-live) to my deltaLake table so that any record in it goes away automatically after a fixed span, I haven't found anything concrete of yet, any one knows if there's a workaround with this?

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
Manish Karki
  • 473
  • 2
  • 11

1 Answers1

2

Unfortunately, there is no configuration called TTL (time-to-live) in Delta Lake tables.

You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retention threshold for the files is 7 days.

Delta Lake provides snapshot isolation for reads, which means that it is safe to run OPTIMIZE even while other users or jobs are querying the table. Eventually however, you should clean up old snapshots.

You can do this by running the VACUUM command:

VACUUM events

You control the age of the latest retained snapshot by using the RETAIN HOURS option:

VACUUM events RETAIN 24 HOURS

For details on using VACUUM effectively, see Vacuum.

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
  • Thanks for the answer, we know about the Vaccum, but what we'd have really liked was something as ttl, guess we'll have to create a job that can constantly run and do the work in the background – Manish Karki Aug 18 '20 at 21:29