1

I'm using a home automation software called OpenHAB, which writes data into a table like this:

http://sqlfiddle.com/#!9/5e35e4/1

+---------------------+-------+
| time                | value |
+---------------------+-------+
| 2016-10-31 22:00:00 |  11.1 |
| 2016-10-31 22:07:08 |  10.8 |
| 2016-10-31 22:20:02 |  10.8 |
| 2016-10-31 22:30:28 |  10.8 |
| 2016-10-31 22:39:29 |  10.8 |
| 2016-10-31 22:44:04 |  10.8 |
| 2016-10-31 22:49:02 |  10.5 |
| 2016-10-31 23:00:00 |  10.5 |
| 2016-10-31 23:42:02 |    10 |
| 2016-11-01 00:00:00 |    10 |
| 2016-11-01 00:30:02 |   9.5 |
| 2016-11-01 01:00:00 |   9.5 |
| 2016-11-01 01:11:02 |   9.3 |
| 2016-11-01 01:22:02 |   9.1 |

I'm now struggling with cleaning up these values, since there are many duplicates (100k+) from the time when I started to use OpenHAB and didn't set up the logging system properly.

If the value (can be of type double or varchar) didn't change in several consecutive rows, every row except for the first and the last should be deleted. Given the example above, the optimum output would look like this:

+---------------------+-------+
| time                | value |
+---------------------+-------+
| 2016-10-31 22:00:00 |  11.1 |
| 2016-10-31 22:07:08 |  10.8 | <-- only here
| 2016-10-31 22:44:04 |  10.8 |
| 2016-10-31 22:49:02 |  10.5 |
| 2016-10-31 23:00:00 |  10.5 |
| 2016-10-31 23:42:02 |    10 |
| 2016-11-01 00:00:00 |    10 |
| 2016-11-01 00:30:02 |   9.5 |
| 2016-11-01 01:00:00 |   9.5 |
| 2016-11-01 01:11:02 |   9.3 |
| 2016-11-01 01:22:02 |   9.1 |
dersimn
  • 805
  • 9
  • 15

2 Answers2

0

If I understand your question correctly, you're interested in only the first and last values in the event of duplicates.

I think this query should do the trick, by omitting everything that's not the MAX or MIN time for a given value after grouping by value:

DELETE FROM Item67 
WHERE time NOT IN (SELECT max(time) FROM item67 GROUP BY value)
AND time NOT IN (SELECT min(time) FROM item67 GROUP BY value);
Matt Morgan
  • 4,900
  • 4
  • 21
  • 30
  • Your query deletes the min/max values in the whole table. For e.g. it would delete these entries: 9 9 10 **10** **10** **10** 8 7 **10** 10 – dersimn Nov 19 '16 at 21:36
  • Hmm...here's what the table you posted looks like after the query I suggested. I take it this is not what you wanted? I will link to a fiddle... – Matt Morgan Nov 19 '16 at 21:41
0

I think I got one possible query. I'm still evaluating if it works properly:

http://sqlfiddle.com/#!9/5e35e4/43

SELECT t1.time, t1.value
FROM Item67 AS t1
WHERE t1.value = (SELECT t2.value 
   FROM Item67 AS t2 
   WHERE t1.time > t2.time 
   ORDER BY t2.time 
   DESC LIMIT 1)
and t1.value = (SELECT t3.value
   FROM Item67 AS t3 
   WHERE t1.time < t3.time
   ORDER BY t3.time 
   LIMIT 1
);
dersimn
  • 805
  • 9
  • 15