1

I am running an Influx database instance which collects data from various sources. Sometimes, this data is simply incorrect and needs to be corrected, otherwise later analyses will fail and/or produce incorrect results. However, InfluxDB's SQL-like dialect does not have an UPDATE capability, and "updating" single data points is only possible by replacing datapoints with the same timestamps using another INSERT.

And for this purpose I am looking for a way to mass-edit selected datapoints inside a measurement, without interrupting the data collection process which may insert new datapoints while I'm editing. Here's what I tried:

  • I can query a subset of a measurement for editing using e.g. SELECT * FROM measurement WHERE time > ... using the HTTP API and I get a JSON file, but there is no (obvious) way to re-insert this JSON (after editing it) and replace the corrected measurements.
  • I can export the whole database using influx_inspect and reimport it, but this is faar too slow. This way, 200MB of raw data get blown up to >1GB of line protocol data which is way overkill to correct 2-3 datapoints.

Any other ideas? I'm fine with scripting and editing stuff in a text editor, but I'm not fine with manually creating an INSERT for each datapoint that needs to be corrected.

Jens
  • 1,386
  • 14
  • 31
  • InfluxDB is generally not designed to be "update heavy" but really excels at insert operations. I'd generally assume, that you are better of "correcting" the data before actually inserting it into the database. How come, that this is not possible? – fxweidinger Apr 19 '20 at 13:32
  • Some of the data feeds are OCR engines fed by cameras. This engine makes occasional mistakes, OCR isn't perfect. These are rare, but if they occur, they mess up all statistics and extrapolations. See here for one example: https://forum.iobroker.net/topic/23577/wasserz%C3%A4hler-selfmade – Jens Apr 19 '20 at 17:04

2 Answers2

1

If this is something you need to do a lot of, you might look into QuestDB (https://questdb.io/ or https://github.com/questdb/questdb on Github). It's a super-fast Time Series Database but supports traditional SQL CRUD operations.

Davidgs
  • 411
  • 6
  • 18
  • Looks nice. Does it support Grafana visualizations, IOBroker, etc? – Jens Jun 25 '20 at 07:28
  • So, we now have a grafana plugin -- just waiting for them to merge it -- but with the new 5.0.3 version our postgres wire protocol support is so good you can just use the postgres plugin. – Davidgs Sep 03 '20 at 19:00
0

Since there seems to be no real solution to this, I've hacked up a script to do this. Beware, it's really quick'n'dirty, untested on anything but my own hardware, and contains a lot of assumptions (eg. database on localhost, available Perl, etc).

https://gist.github.com/jensb/9efa234b80024a0e335de760d9a4f3aa

Jens
  • 1,386
  • 14
  • 31