For fields where there exists at least one "invalid" value (for example, a negative size in bytes) you can create a query which allows you to find rows with missing data, without modifying the stored data.
I have a metric with 5 fields: mac
, win
, win64
, linux
, and linux64
, not every field is filled in in every row, and on occasion a row will not be added due to it having no data available at the time.
By first querying the data with a fill()
clause set to my invalid value: -1
in a subquery, I can then wrap that in an outer query to find either rows which are missing at least one column (using OR
between WHERE
expressions) or rows with no data at all (using AND
between WHERE
expressions).
The subquery looks like this:
SELECT count(*) FROM "firefox" GROUP BY time(1d) fill(-1)
That gives me all of my rows (there's one per day) with a 1
(the count of the occurrences of that field for the day) or a -1
(missing) as the value returned for each field.
I can then choose the rows that have no data from that with an outer query like this (note in this case the returned fields are all -1 and therefore uninteresting and can be hidden in your visualizer, like Grafana):
SELECT * from (_INNER_QUERY_HERE_) WHERE count_linux = -1 AND count_linux64 = -1 AND count_mac = -1 AND count_win = -1 AND count_win64 = -1;
Or I can choose rows with at least one missing field like this:
SELECT * from (_INNER_QUERY_HERE_) WHERE count_linux = -1 OR count_linux64 = -1 OR count_mac = -1 OR count_win = -1 OR count_win64 = -1;
There is still room for improvement though, you have to specify the field names in the outer query manually whereas something like WHERE * = -1
would be much nicer. Also depending on the size of your data this query will be SLOOOOOOW, and filtering by time is very confusing when you use nested queries. Obviously it'd be nicer if the influx folks just added is null
or not null
or some similar syntax to influxql, but as has been linked above they don't seem too interested in doing so.