2

I am building a MySQL table to aggregate a set of particular information for a customer. Much of it I've assembled and filtered successfully with simple INNER JOIN commands on records with shared keys/columns, but the pivoting/transposing of the data, even being of a fixed size, is still causing me some syntax confusion for my query. The schema for this table t_snapshots can be seen as follows:

+-------------+---------------------+------+-----+---------------------+----------------+
| Field       | Type                | Null | Key | Default             | Extra          |
+-------------+---------------------+------+-----+---------------------+----------------+
| id          | int(11)             | NO   | PRI | NULL                | auto_increment | 
| value_type  | tinyint(1) unsigned | YES  |     | NULL                |                |
| ch1_id      | varchar(20)         | YES  |     | NULL                |                |
| ch1_val     | float               | NO   |     | 0                   |                |
| ch2_id      | varchar(20)         | YES  |     | NULL                |                |
| ch2_val     | float               | NO   |     | 0                   |                |
| ch3_id      | varchar(20)         | YES  |     | NULL                |                |
| ch3_val     | float               | NO   |     | 0                   |                |
| ch4_id      | varchar(20)         | YES  |     | NULL                |                |
| ch4_val     | float               | NO   |     | 0                   |                |
| timestamp   | datetime            | NO   | MUL | current_timestamp() |                |
+-------------+---------------------+------+-----+---------------------+----------------+

Simply put, I want to select the most recently updated value from t_other_data0.ch[n] to send to t_snapshots.ch[n]_val and select the most recently updated id from t_id_pool.unit_id for each unique t_id_pool.channel_num to send to t_snapshots.ch[n]_id. t_id_pool.channel_num is what correlates to the n value of columns t_snapshots.ch[n]_val:

--EDIT--: Ideally, sample data from source table t_other_data0 that looks for the latest unit_id value from t_id_pool for channel_num=1,2,3,4 and outputs to table t_snapshots:

Gather latest process data from t_other_data0. In this case, rows with id 5-8 are selected because they span all distinct value_type with the latest timestamp.:

Table: t_other_data0
+----+------+------+------+------+------------+---------------------+
| id | ch1  | ch2  | ch3  | ch4  | value_type | timestamp           |
+----+------+------+------+------+------------+---------------------+
|  1 | 1.65 | 3.25 | 1.98 | 2.17 |          1 | 2021-07-22 16:26:40 |
|  2 | 3.12 | 2.33 | 6.42 | 3.22 |          2 | 2021-07-22 16:26:40 |
|  3 | 2.22 | 2.24 | 3.34 | 1.17 |          3 | 2021-07-22 16:26:40 |
|  4 | 1.52 | 1.34 |  1.9 | 2.01 |          4 | 2021-07-22 16:26:40 |
|  5 |  3.2 | 3.21 | 5.42 | 2.13 |          1 | 2021-07-22 16:26:50 |
|  6 | 1.55 | 1.92 | 4.32 | 4.12 |          2 | 2021-07-22 16:26:50 |
|  7 | 2.31 | 1.93 | 2.36 |  3.4 |          3 | 2021-07-22 16:26:50 |
|  8 | 1.78 | 2.17 | 5.62 | 2.34 |          4 | 2021-07-22 16:26:50 |
+----+------+------+------+------+------------+---------------------+

Because these permanent channels change what temporary equipment they're tied to, we determine the current unit_id for each channel_num using latest unit_id from t_id_pool:

Table: t_id_pool
+----+---------------------+-------------+---------+
| id | timestamp           | channel_num | unit_id |
+----+---------------------+-------------+---------+
|  1 | 2021-07-22 09:39:09 |           1 | S4251   |
|  2 | 2021-07-22 09:38:09 |           2 | S3552   |
|  3 | 2021-07-22 09:38:09 |           3 | S0001   |
|  4 | 2021-07-22 09:38:09 |           4 | S1001   |
|  5 | 2021-07-22 09:39:10 |           1 | P5251   |
|  6 | 2021-07-22 09:38:10 |           2 | P4552   |
|  7 | 2021-07-22 09:38:10 |           3 | P1001   |
|  8 | 2021-07-22 09:38:10 |           4 | P2001
+----+---------------------+-------------+---------+

Output to t_snapshots:

Table: t_snapshots
+-----+---------------------+------------+--------+---------+--------+---------+--------+---------+--------+---------+
| id  | timestamp           | value_type | ch1_id | ch1_val | ch2_id | ch2_val | ch3_id | ch3_val | ch4_id | ch4_val |
+-----+---------------------+------------+--------+---------+--------+---------+--------+---------+--------+---------+
| 211 | 2021-07-14 16:26:50 |          1 | P5251  |     3.2 | P4552  |    3.21 | P1001  |    5.42 | P2001  |    2.13 |
| 212 | 2021-07-14 16:26:50 |          2 | P5251  |    1.55 | P4552  |    1.92 | P1001  |    4.32 | P2001  |    4.12 |
| 213 | 2021-07-14 16:26:50 |          3 | P5251  |    2.31 | P4552  |    1.93 | P1001  |    2.36 | P2001  |     3.4 |
+-----+---------------------+------------+--------+---------+--------+---------+--------+---------+--------+---------+



 
Big Owls
  • 37
  • 6
  • Share us some example (fake) data from `t_id_pool` table.. Also with that example data, how would the expected result look like in the `SELECT` query. – FanoFN Jul 22 '21 at 07:18
  • @FaNo_FN Edits made to describe source tables and ideal output as requested – Big Owls Jul 22 '21 at 16:49
  • 1
    `id=4` is selected? Because it seems like it's not the latest `timestamp` for it. Also, I didn't see any of the `id=4` row data in the output. – FanoFN Jul 23 '21 at 00:27
  • 1
    I was wondering why the `t_other_data.id=8` was missing. I started [a fiddle](https://www.db-fiddle.com/f/3V7J1sBtGhb9SumxKwyBLr/1) but ran out of time. [mariadb-10.4 dbfiddle<>](https://dbfiddle.uk/?rdbms=mariadb_10.3) would have been better but its not running for me at the momement. – danblack Jul 23 '21 at 00:34
  • That's correct regarding id=4. In my example, I meant to say rows with id 5-8 are selected. I have corrected this, along with misspelt table names. – Big Owls Jul 23 '21 at 16:59
  • Additionally, I did only provide a transformed output for `value_type`=1,2,3 while the raw records indicated 4 different value types. Sorry for the confusion. – Big Owls Jul 23 '21 at 17:45

1 Answers1

1

Table t_other_data0 seems to be a pivot table. Therefore, I think the first step is to un-pivot it then join it with t_id_pool table to get the latest unit_id, then re-pivot it again. Maybe a query like this can work:

SELECT 0 id, tod.timestamp, value_type,
       MAX(case when channel_num=1 THEN unit_id else 0 END) AS ch1_id,
       SUM(case when channel_num=1 then chan_val else 0 END) as ch1_val,
       MAX(CASE WHEN channel_num=2 THEN unit_id ELSE 0 END) AS ch2_id,
       SUM(CASE WHEN channel_num=2 THEN chan_val ELSE 0 END) AS ch2_val,
       MAX(CASE WHEN channel_num=3 THEN unit_id ELSE 0 END) AS ch3_id,
       SUM(CASE WHEN channel_num=3 THEN chan_val ELSE 0 END) AS ch3_val,
       MAX(CASE WHEN channel_num=4 THEN unit_id ELSE 0 END) AS ch4_id,
       SUM(CASE WHEN channel_num=4 THEN chan_val ELSE 0 END) AS ch4_val
FROM   (SELECT value_type, ch1 AS chan_val, 1 AS chan_num, timestamp
        FROM   (SELECT *, Row_number() OVER (partition BY value_type ORDER BY id DESC) rn
                FROM t_other_data0) AS A
        WHERE  rn = 1 UNION ALL
        SELECT value_type, ch2, 2,
               timestamp
        FROM   (SELECT *, Row_number() OVER (partition BY value_type ORDER BY id DESC) rn
                FROM t_other_data0) AS A
        WHERE  rn = 1 UNION ALL
        SELECT value_type, ch3, 3,
               timestamp
        FROM   (SELECT *, Row_number() OVER (partition BY value_type ORDER BY id DESC) rn
                FROM t_other_data0) AS A
        WHERE  rn = 1 UNION ALL
        SELECT value_type, ch4, 4,
               timestamp
        FROM   (SELECT *, Row_number() OVER (partition BY value_type ORDER BY id DESC) rn
                FROM t_other_data0) AS A
        WHERE  rn = 1) AS tod
       JOIN (SELECT id, timestamp, channel_num, unit_id, 
                    Row_number() OVER (partition BY channel_num ORDER BY timestamp DESC) rn
             FROM t_id_pool) AS tip
         ON tod.chan_num = tip.channel_num AND tip.rn = 1
    GROUP BY tod.timestamp, value_type;

One of the functions being used here is ROW_NUMBER() with a purpose to assign row number 1 on the latest value_type and channel_number timestamps. As for table t_other_data0, I'm using UNION ALL and there are 4 queries in total following the columns ch1, ch2, ch3 & ch4. Each of them I assign with a hardcoded chan_num according to which column I take.

I'm not sure about the column id there of what to populate but I assume that since the main purpose of the query is to INSERT into another table, then maybe the id column is auto increment.

Unfortunately dbfiddle.uk can't be use since yesterday so the fiddle here is for MySQL v8.0 instead of MariaDB 10.3. https://www.db-fiddle.com/f/xf1VmfYMbnGcabJS7dS6A1/1 . The result in the fiddle will have an extra row for t_other_data.id=8 (mentioned by @danblack in the comment) and will not include id=4 since your condition is "Gather latest process data from t_other_table0". But judging from your expected output, it seems like you didn't include id=4 so maybe there were some mistype in your description.

FanoFN
  • 6,815
  • 2
  • 13
  • 33
  • 1
    Great example. Thanks for taking the time to put all that in Fiddle and show the example. The syntax makes a little more sense to me now with the unions and joins to make this work. Seems to have transfered over well into MDB 10.3 as far as I can tell; have not tested execution times on my adaptation of this to the production data pool, but look forward to trying it out! – Big Owls Jul 26 '21 at 04:54
  • Only issue I'm seeming to have thus far is in a MySQL 5.7 environment (Amazon Aurora). Getting a 1064 near '(partition' but to my knowledge I assumed this was legal. – Big Owls Jul 27 '21 at 17:34
  • Unfortunately, MySQL 5.7 doesn't support `ROW_NUMBER()`. There's a way to generate row numbers for older versions. Let me get back to you on that. – FanoFN Jul 27 '21 at 23:24
  • 1
    See this fiddle @user8585939 https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=7044825b1221db236c3d5e6f05da2318 – FanoFN Jul 28 '21 at 01:08
  • Looking back on this, would you optimize this any way for really large data sets within `t_other_data0` (>10M lets say)? – Big Owls Aug 05 '21 at 00:29
  • I can try but I suggest that you post a new question. I can assure you that there are a lot of other capable, more experienced people in SO that can help you in figuring out the performance side of it. For me, I don't really have a lot of experience in performance tweaking, most of my work are not really "performance demanding" per se. Maybe someone can even suggest to you a more refined and efficient query. – FanoFN Aug 05 '21 at 00:49