3

In TimescaleDB, how to set a standard size in terms of GB/MB for a particular table/hypertable, so that when it reaches a particular size, it begins to delete the old rows in order to accommodate new rows of data.

From the documentation, it was evident that only retention policy(time interval) can be added for a hypertable. Is it possible to add retention policy for size?

Ann
  • 137
  • 1
  • 11
  • 2
    The retention policy drops the entire chunks instead of row by row. Every chunk is like a sub-table of the main table. When the retention policy runs, it does not delete record by record but truncates/removes the entire chunk. – jonatasdp May 23 '22 at 14:22
  • @Jonatasdp Thankyou, I understand. Can we drop the chunks based on size instead of time? (I do understand the chunks are separated in terms of time periods) but is there a way to drop them on size basis? – Ann May 23 '22 at 14:28
  • 4
    @Ann - there is no automated way to do this with TimescaleDB policies. For now, `drop_chunks()` and the associated policies all work on time intervals with no parameter to name specific chunks. Even if this was allowed, depending on your expectations, it would be tricky because varied use of your database could result in some chunks being bigger than others & so you might drop chunks in an unpredictable order. This is usually where our suggested usage pattern usually helps. First, enable compression of older chunks, & then drop them further out than you would have to without compression. – Ryan May 23 '22 at 17:53
  • @Ryan Thank you for the insight. I understood the complexity involved – Ann May 24 '22 at 15:18
  • 1
    If this is a feature you'd like us to add you could file a feature request on our github page! https://github.com/timescale/timescaledb – davidk May 25 '22 at 15:13

1 Answers1

2

Retention policy drops entire chunk and chunks are measured by time intervals, thus there is no sense to define policy in size and not in time. The policy drops a chunk after entire chunk is older than given interval, thus if chunk size is 7 days and retention policy is 3 days, then the oldest dropped data will be 10 days old (the dropped chunk contains data from 10 to 3 days old). Chunks are represented by tables internally, thus dropping a chunk is dropping a table, which is the most efficient way to delete data in PostgreSQL. Deleting by row is much more expensive than dropping or truncating a table and doesn't free space until VACUUM is run.

TimescaleDB expects that you know your application load well and can correctly estimate desired size in time interval.

Time dimension column is not required to have time type, but can be a number. It is important that time dimension column increases over time and it is clear how to use in queries and define chunk size. So it is possible to use a counter for the time dimension column and increment it for each row by 1 or by row size. Notice that syncing counter can be a bottleneck.

It is also possible to write a user-defined action, where own action can be defined to be executed on regular basis as a custom policy.

Summary of thee possible solutions:

  1. Give good estimate of chunk size, which is expected way by TimescaleDB.
  2. Define numerical Time dimension column with counter-like implementation.
  3. Write custom policy using user-defined action.
k_rus
  • 2,959
  • 1
  • 19
  • 31