I have the following records:
id run_hour performance_hour value
2 "2017-06-25 09:00:00" "2017-06-25 07:00:00" 6
2 "2017-06-25 09:00:00" "2017-06-25 08:00:00" 5
1 "2017-06-25 09:00:00" "2017-06-25 08:00:00" 5
2 "2017-06-25 08:00:00" "2017-06-25 07:00:00" 5
1 "2017-06-25 08:00:00" "2017-06-25 07:00:00" 5
We run every hour to look at the result of each id for current hour and previous hours.
Only if there was a change from previous hour run we insert a new reocrd (we don't want to overwrite the value because we want to measure the value if looked after 1 hour, or 2 hours etc.
I would like to sum for each id in the latest available value (sort by run_hour) - the values.
In the above example, ad 1 for run 9:00 and performance hour of 7:00 doesn't have a record - because it was the same as the run of 8:00 and performance hour of 7:00
In the above example, if I ask for sum of values of run 2017-06-25 09:00:00 I would expect to get:
id, value
1 10
2 11
for id 1, it's 10 calculated: (run_hour<2017-06-25 08:00:00> + run_hour<2017-06-25 09:00:00>) and for id 2, it's 11 calculated: (run_hour<2017-06-25 09:00:00> + run_hour<2017-06-25 09:00:00>) I wrote the following query:
select distinct on (id, run_hour) id, sum(value) from metrics where run_hour <= '2017-06-25 09:00' and performance_hour >= '2017-06-25 07:00' and performance_hour < '2017-06-25 09:00'
group by id
order by id, run_hour
However I get an error that also run_hour has to be in the GROUP BY clause. - but if I add it I get incorrect data - also data for previous hours which I don't need - I need the latest hour that had data.
How can I use "distinct on" with group by?