1

I have got three tables whose data I need to join based on a common field.

sample pseudo table defs:

barometer_log(device ,pressure float,sampleTime timestamp)

temperature_log(device int,temperature float,sampleTime timestamp)

magnitude_log(device int, magnitude float,utcTime timestamp)

The tables each will contain billions of rows eventually but currently contain around 500,000 rows each.

I need to be able to combine data(FULL JOIN) from the tables to so that the sampleTime is merged as one column (COALESE) to give me rows as: device, sampleTime, pressure,temperature,magnitude

I need to be able to query data by specifying a device and start and end date, e.g. select .... where device=1000 and sampleTime between '2011-10-11' and '2011-10-17'

I tried different UNION ALL technique with RIGHT and LEFT joins as suggested in MySql full join (union) and ordering on multiple date columns and MySql full join (union) and ordering on multiple date columns , but the query takes too long and i have to stop it or throws error about temp file size after running for hours. What is the best way for me to query the three tables and merge their output in an acceptable time frame?

Here's the full table definition as suggested. note: device table hasn't been included .

magnitude_log

CREATE TABLE magnitude_log (
  device int(11) NOT NULL,
  magnitude float not NULL,
  sampleTime timestamp NOT NULL,  
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT magnitudeLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

barometer_log

CREATE TABLE barometer_log (
  device int(11) NOT NULL,
  pressure float not NULL,  
  sampleTime timestamp NOT NULL,  
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT barometerLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

temperature_log

CREATE TABLE temperature_log (
  device int(11) NOT NULL,
  sampleTime timestamp NOT NULL,  
  temperature float default NULL,
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT temperatureLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;
Community
  • 1
  • 1
anzaan
  • 245
  • 1
  • 5

3 Answers3

1

First, get all combinations of (device, sampleTime) from all the 3 tables, in the required period:

-------- Q --------
    SELECT device, sampleTime
    FROM magnitude_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'
UNION
    SELECT device, sampleTime
    FROM barometer_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'
UNION
    SELECT device, sampleTime
    FROM temperature_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'

Then use this to LEFT JOIN the 3 tables:

SELECT
    q.device
  , q.sampleTime
  , b.pressure
  , t.temperature
  , m.magnitude
FROM 
    ( Q ) AS q
  LEFT JOIN
    ( SELECT * 
      FROM magnitude_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS m
      ON (m.device, m.sampleTime) = (q.device, q.sampleTime)
  LEFT JOIN
    ( SELECT * 
      FROM barometer_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS b
      ON (b.device, b.sampleTime) = (q.device, q.sampleTime)
  LEFT JOIN
    ( SELECT * 
      FROM temperature_log_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS t
      ON (t.device, t.sampleTime) = (q.device, q.sampleTime)

The longer the period you have, the longer the query will struggle with the UNION subquery. You may consider putting the Q as a separate table, possibly filling it via triggers with the unique (device, sampleTime) combinations from the three other tables.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thanks for the answer. I'll test it and let you know how it goes – anzaan Nov 29 '11 at 07:32
  • The query works well as far as I can from the tests I ran. But there was an odd problem though. I also tested @mikn answer and his result fetched 73 records whereas your query returned 72 records. When I ran a separate query on one of the tables that had full set of data it also returned 72 records, which seems to be the right number of records. Any idea what could be happening? – anzaan Nov 29 '11 at 09:55
0

Assuming the table device contains all devices you don't really need a proper full join, you'll just have to left join the other tables on device and group on sample time like this:

SELECT
    d.id AS device,
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
    m.magnitude,
    b.pressure,
    t.temperature
FROM device AS d
    LEFT JOIN magnitude_log AS m ON d.id = m.device
    LEFT JOIN barometer_log AS b ON d.id = b.device
    LEFT JOIN temperature_log AS t ON d.id = t.device
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime BETWEEN '2011-10-11' AND '2011-10-17'

This might however be slow, since it will be doing the grouping before it actually matches on the time span, but if a single device won't have millions of rows by itself, it shouldn't be a problem. However, if it is, I'd recommend putting the sampleTime on each join:

SELECT
    d.id AS device,
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
    m.magnitude,
    b.pressure,
    t.temperature
FROM device AS d
    LEFT JOIN magnitude_log AS m ON d.id = m.device AND m.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
    LEFT JOIN barometer_log AS b ON d.id = b.device AND b.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
    LEFT JOIN temperature_log AS t ON d.id = t.device AND t.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime IS NOT NULL

Hope that helps!

mikn
  • 484
  • 3
  • 7
  • Why `IFNULL(x, NULL)`? (How is that different than just x?) – Billy ONeal Nov 29 '11 at 07:27
  • There should be no difference at all, you are correct! I was a bit paranoid. – mikn Nov 29 '11 at 07:39
  • The only 'side effect' this approach might have is that if the device you are selecting on has no data associated with it, it will still give you one row. You can avoid this by adding `HAVING sampleTime NOT NULL` at the end. :) – mikn Nov 29 '11 at 08:12
  • @mikn thanks for the answer. it works great and is short and sweet. Only issue is when I appended **HAVING sampleTime NOT NULL** at the end the query failed to run with syntax error. Any idea why? Although at the moment I can live with an empty row when device has no data as I can take care of that on application side with no fuss. – anzaan Nov 29 '11 at 09:57
  • Ah, my apologies. It should be `HAVING sampleTime IS NOT NULL`. It was too broken English for me to spell out it would seem! Edited my answer with it too, as you can see in it, the `HAVING` addition is only needed for the second one. :) – mikn Nov 29 '11 at 10:46
  • Ah, I should have actually caught that omission. Thanks for the update and thanks a heaps for the answer. it works superb. – anzaan Nov 29 '11 at 12:00
  • @mikn is there any reason combining abother table in the mix doesn't work? I have a similar table with event data and I tried to combine it but the result had the right values from other tables but the new table's values are the same in every row instead of being null when there's no corresponding row in the table. Any thoughts on that? – anzaan Nov 29 '11 at 17:41
  • I'd need to know what 'similar' means to answer that. Does it mean that sampleTime only contains a date too? Did you try selecting just a column from the new table or did you try to perform any sort of arithmetic with it? Are you using the exact same ON-clause as for the other tables (you might have forgotten to change the `d.id = *table*.device`) or are you joining the other table on one of the `LEFT JOIN`s? The value shown, is it correct for any of the rows? Are you getting the same amount of rows as before? :) – mikn Nov 29 '11 at 19:04
  • @mikn. I cannot add the full query or table schema due to character limit in comments. The event table is identical in its structure to other tables here's the details of the 4th table **event_log**(device,sampleTime,event) **select clause modification:** `COALESCE(m.sampleTime, b.sampleTime, t.sampleTime,e.sampleTime) AS sampleTime, ,e.event` **join clause addition:** `LEFT JOIN event_log AS e ON d.id = e.device AND e.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'` – anzaan Nov 30 '11 at 04:52
  • @mikn please see this [link](http://sipx.barahisolutions.com/mysql-joining-multiple-tables.htm) for full table schema and query – anzaan Nov 30 '11 at 05:08
  • You're missing a , after temperature in the column list, but you'd get a syntax error from that, not what you are describing. Try that exact query with that comma and it *should* work. – mikn Nov 30 '11 at 07:09
  • @mikn the comma got deleted when I copied and formatted the query for syntax highligter so the problem isn't the comma.I tested the query in the answer by **ypercube** and added the event_log table and it works fine -except for some reason his query returns one less row than your query. – anzaan Nov 30 '11 at 13:04
0

If you are querying a small time range and lots of devices, you might want to consider reversing the PK index to make it (timeRange,device).

You'd probably want a secondary index on device or (device,timeRange) then.

MarkR
  • 62,604
  • 14
  • 116
  • 151