1

What I'm trying to achieve is to only keep the latest of any given point identified by an ID, delete everything else.

The ID is a tag.

So let's say we have:

time                 ID   ...
2022-06-28 18:29:00  id1  ...
2022-06-28 18:28:00  id1  ...
2022-06-28 18:27:00  id1  ...
2022-06-28 18:29:00  id2  ...
2022-06-28 18:28:00  id2  ...
2022-06-28 18:29:00  id3  ...

Would result to:

time                 ID   ...
2022-06-28 18:29:00  id1  ...
2022-06-28 18:29:00  id2  ...
2022-06-28 18:29:00  id3  ...

Is that possible without having to do something like:

DELETE FROM "measurement" WHERE "ID" = '...' AND time < ...

Which take way too much time to execute on all possible "duplicates". You can't also have any OR in a delete where statement. Multiple statements like specified here also take too long to execute.

1 Answers1

0

Having ID as a tags has saved you a lot of time already.

Maybe we could think over a little bit about your scenario. Are you trying to keep the LATEST data for every INTERVAL? For example, your original sampling rate is 1 second (that is, your data is fed into the InfluxDB every single second) and you would like downsample it for every 5 seconds. That is, you are only taking the last record over that 5-second interval and dump all rest of data.

If the above assumption is right, you could try:

  1. run continuous queries first to get your downsampled data into another database per this doc:
CREATE CONTINUOUS QUERY "cq_basic" ON "someDatabase"
BEGIN
  SELECT time, id, LAST("fieldA"), LAST("fieldA") INTO "latest_5sec_measurement" FROM "measurement" GROUP BY time(5s)
END

This CQ is like a cron job and will keep transforming the original data into the less granular one.

  1. delete the obsolete data with time range (InfluxDB loves you when you are doing this as time range deletion is the most efficient way) in the original database per this doc:
DELETE WHERE time > '2022-06-27 18:29:00' AND time < '2022-06-28 18:29:00'

And you can set up another cron job to run this query as long as you double check the above CQ has been executed successfully.

Munin
  • 1,576
  • 2
  • 19
  • It is not for every INTERVAL, it's for every unique ID. Data could be fed at any rate. Then the command will be ran by a cron job – William Chretien Jun 29 '22 at 12:45
  • unfortunately, AFAIK, such deletion against a tag would still be resource-intensive operation as InfluxDB store the data based on time shards. – Munin Jul 01 '22 at 05:07