I'm running a fairly complicated SQL statement to create a summary table from a big table (38 million rows) of raw data. (I'm trying to get current, low-this-season, high-this-season, percent-the-price-has-been-1-cent-this-week/month/season into a cache
table for easy querying later.)
INSERT INTO cache (`time`, name, price, low, high, week, month, season)
SELECT
MAX(`time`) AS `time`,
name,
MIN(CASE WHEN `time` = 1498511444 THEN price ELSE 999999 END) AS price,
MIN(price) AS low,
MAX(price) AS high,
SUM(CASE WHEN `time` > 1497906644 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1497906644 THEN 1 ELSE 0 END) AS week,
SUM(CASE WHEN `time` > 1480367444 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1480367444 THEN 1 ELSE 0 END) AS month,
SUM(CASE WHEN `time` > 1493362800 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1493362800 THEN 1 ELSE 0 END) AS season
FROM
(SELECT
`time`,
name,
MIN(price) AS price
FROM price
WHERE `time` > 1493362800
GROUP BY `time`, name) AS tmp
GROUP BY name
After adding an index on the price.time column I managed to get this down to 0.6s on local (it took 30s before). On prod (with the same index) it takes a very long time (30s+) and then fails with Errcode: 28 - No space left on device. If I watch df
while it is running I see the free space slowly dwindle from 9.9G to 9.6G at about 3MB/s. Then after a few minutes free space suddenly starts dropping 500MB/s until there is no space left and the query fails. On local there's seemingly no blip in available space although I guess it could be so fast that my df
in a while loop isn't seeing it.
I also get the disk-eating behavior if I try and create a table containing the results of the subquery first:
INSERT INTO initial_cache (`time`, name, price)
SELECT
`time`,
name,
MIN(price) AS price
FROM price
WHERE `time` > 1493337600
GROUP BY `time`, name
Do you know why my query needs so much space to run? And why it would behave so differently on prod?
Thanks!