0

Sorry if this has been asked before. Let's imagine I have table of temperature measurements inside a set of mechanical components:

ComponentID Timestamp Value
A 1st Jan 2020 00:00 20 C
A 1st Jan 2020 00:10 25 C
B 1st Jan 2018 00:00 19C

...and so on. Size of the table is fairly big, i.e. I have thousands of components with 10-minute measurements over a couple of years. What I need is a tally of the temperatures for each component in each year into, say, 5-degree bins, so a table looking like this:

ComponentID Year [-20;-15) [-15,-10) [-10;-5) ...
A 2018 5 20 300 ...
A 2019 0 41 150 ...
B 2018 60 10 1 ...

..so for each component in each year, I count the number of measurements where the temperature has been in the [-20,-15) range, the number of measurements in the [-15,-10) range, and so on. I have a query doing this, but it's awfully slow. Is there an 'optimal' way of doing this kind of aggregation?

tschmidt
  • 1
  • 2

2 Answers2

0

I'd say you should first pre-process your data to make it more simple to aggregate, then aggregate it with another query like (MySQL syntax):

SELECT cats.ComponentID, cats.Year,
    SUM(tm5)  `[-5;0)`,
    SUM(t00)  `[0;5)`,
    SUM(tp5)  `[5;10)`,
    SUM(tp10) `[10;15)`,
    SUM(tp15) `[15;20)`,
    SUM(tp20) `[20;25)`,
    SUM(tp25) `[25;30)`
FROM (
    SELECT
        ComponentID,
        YEAR(`Timestamp`) `Year`,
        (`Value` BETWEEN -5  AND -0.0001 ) tm5,
        (`Value` BETWEEN  0  AND  4.9999 ) t00,
        (`Value` BETWEEN  5  AND  9.9999 ) tp5,
        (`Value` BETWEEN  10 AND  14.9999) tp10,
        (`Value` BETWEEN  15 AND  19.9999) tp15,
        (`Value` BETWEEN  20 AND  24.9999) tp20,
        (`Value` BETWEEN  25 AND  29.9999) tp25
    FROM
        measurements
) cats
GROUP BY cats.ComponentID, cats.Year
ORDER BY cats.ComponentID, cats.Year

Inner query could be done into a temporary table if it's too much of a strain on memory.

I've ignored the fact that your temperatures are expressed as Strings including unit, which of course you should convert to numbers at some point, as it was not the point of the question.

Input (table named measurements):

    id  ComponentID            Timestamp   Value  
------  -----------  -------------------  --------
     3  B            2018-01-01 00:00:00        19
     4  A            2019-03-05 05:10:00        16
     5  A            2019-12-01 00:00:00        18
     1  A            2020-01-01 00:00:00        20
     2  A            2020-01-01 00:10:00        25

Result:

ComponentID    Year  [-5;0)  [0;5)   [5;10)  [10;15)  [15;20)  [20;25)  [25;30)  
-----------  ------  ------  ------  ------  -------  -------  -------  ---------
A              2019  0       0       0       0        2        0        0        
A              2020  0       0       0       0        0        1        1        
B              2018  0       0       0       0        1        0        0        
julien.giband
  • 2,467
  • 1
  • 12
  • 19
0

I would suggest:

SELECT ComponentID, YEAR(`Timestamp`) as Year,
       SUM(Value >= -20 AND Value < -15) as [-20;-15),
       SUM(Value >= -15 AND Value < -10) as [-15;-10),
       SUM(Value >= -10 AND Value < -5)  as [-10;-05),
       SUM(Value >= -5  AND Value < 0)   as [-05;00),
       SUM(Value >= 0   AND Value < 5)   as [00;-05),
       . . .
FROM measurements m
GROUP BY m.ComponentID, Year;

Note the use of inequalities to capture the exact ranges that you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786