1

I have a MySQL data base with more than 92.000 rows with weather registers every half an hour. Day | Month | Year | Time | Temperature |... I'm trying to obtain (in PHP) Peak temperatures: Show the maximum amount of time(consecutive registers) with Temperature =< min(temperature)+3 in each month.

I would appreciate any help!

O. Jones
  • 103,626
  • 17
  • 118
  • 172
A4SES
  • 19
  • 1
  • 1
    It might make sense for you to put some sample data on sqlfiddle.com, and to show a few things about what you've tried. – O. Jones Jun 14 '13 at 13:21
  • What I don't know is how to count consecutive registers... I mean, how many times in a row the condition is met. – A4SES Jun 14 '13 at 14:11
  • Here's a possible starting place. http://stackoverflow.com/questions/11085596/method-of-finding-gaps-in-time-series-data-in-mysql/11086000#11086000 – O. Jones Jun 16 '13 at 17:40

1 Answers1

0

My approach to this: start with the time-series of observations, and give each one a serial number.

This serial numbering is a pain in the neck in MySQL, but no matter. Given a table with a ts column (a datetime item) and a temp column, here's the query to get them with serial numbers.

SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s 

Take a look at this sqlfiddle: http://sqlfiddle.com/#!2/d81e2/5/0

OK, that's pretty trivial. Now, let's say we're looking for periods of time where the temperature is 25 degrees or above. To do this we need to chop up the time series so it omits those observations. That goes like this:

SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s

Here's the sqlfiddle: http://sqlfiddle.com/#!2/d81e2/6/0

Now the next trick is to find the time gaps in this sequence. We can use the technique from this SO post to do that. Method of finding gaps in time series data in MySQL?

Next step, we join it to itself.

SELECT two.ser, two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
  FROM (
     /* virtual table */
  ) ONE
  JOIN (
     /* same virtual table */
  ) TWO ON (TWO.ser+ 1 = ONE.ser)

This query gets the time gap between each item in the series and the one after it. It's a straightforward thing to do conceptually, but tricky in the MySQL version of SQL. Here's the full query.

SELECT two.ser, two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
      FROM (
 SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s
      ) ONE
      JOIN (
SELECT @sample2:=@sample2+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample2:=0) s
      ) TWO ON (TWO.ser+ 1 = ONE.ser)

Here's the sqlfiddle: http://sqlfiddle.com/#!2/d81e2/13/0 Notice that some of the gaps are 30 minutes in duration. That's normal for consecutive readings. Some are 60 minutes. That's also normal, because the time series I'm using has some missing entries. The entries in this result set show the times and temperatures immediately before the gaps.

So, all that's left is to get rid of the junk gaps (30 and 60 minutes) and then order the remaining gaps in descending order.

SELECT two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
      FROM (
 SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s
      ) ONE
      JOIN (
SELECT @sample2:=@sample2+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample2:=0) s
      ) TWO ON (TWO.ser+ 1 = ONE.ser)
 WHERE TIMESTAMPDIFF(MINUTE, two.ts, one.ts)> 60
 ORDER BY TIMESTAMPDIFF(MINUTE, two.ts, one.ts) DESC

This gives one row for each sequence of time where the temperature is above 25 degrees; the longest time first. The item shown in the result set is the last time temperature below 25 before it went up. SQL Fiddle. http://sqlfiddle.com/#!2/d81e2/14/0

Fun, eh?

Community
  • 1
  • 1
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thank you very much for the answer, I'll try it as soon as possible. – A4SES Jun 17 '13 at 10:47
  • It's to complex for me to digest this code in a proper way but I could adapt it to my data and after 16 minutes of processing I could obtain results. Is it possible to group them by months? – A4SES Jun 17 '13 at 13:01
  • You can select a time subset in the two subqueries. Be aware, though, that your measurements may be distorted a little bit when you go month by month. A heat wave that begins in the last few days of June and runs into July won't show up. – O. Jones Jun 17 '13 at 14:40
  • The query took two minutes on my laptop with a dataset of about 73K observations. – O. Jones Jun 17 '13 at 14:53