I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any way to calculate how long this query might take?
The following is the query in question.
ALTER TABLE pIndexData REORGANIZE PARTITION pMAX INTO (
PARTITION p2022 VALUES LESS THAN (UNIX_TIMESTAMP('2023-01-01 00:00:00 UTC')),
PARTITION pMAX VALUES LESS THAN (MAXVALUE)
)
For context, the pIndexData table has about 6 billion records and the pMAX partition has roughly 2 billion records. This is an Amazon Aurora instance and the server is running MySQL 5.7.12. The DB Engine is InnoDB. The following is the table syntax.
CREATE TABLE `pIndexData` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`DateTime-UNIX` bigint(20) NOT NULL DEFAULT '0',
`pkl_PPLT_00-PIndex` int(11) NOT NULL DEFAULT '0',
`DataValue` decimal(14,4) NOT NULL DEFAULT '0.0000',
PRIMARY KEY (`pkl_PPLT_00-PIndex`,`DateTime-UNIX`),
KEY `id` (`id`),
KEY `DateTime` (`DateTime-UNIX`) USING BTREE,
KEY `pIndex` (`pkl_PPLT_00-PIndex`) USING BTREE,
KEY `DataIndex` (`DataValue`),
KEY `pIndex-Data` (`pkl_PPLT_00-PIndex`,`DataValue`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`DateTime-UNIX`)
(PARTITION p2016 VALUES LESS THAN (1483246800) ENGINE = InnoDB,
PARTITION p2017 VALUES LESS THAN (1514782800) ENGINE = InnoDB,
PARTITION p2018 VALUES LESS THAN (1546318800) ENGINE = InnoDB,
PARTITION p2019 VALUES LESS THAN (1577854800) ENGINE = InnoDB,
PARTITION p2020 VALUES LESS THAN (1609477200) ENGINE = InnoDB,
PARTITION p2021 VALUES LESS THAN (1641013200) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
In researching this question, I found using Performance Schema could provide the answer to my question. However, Performance Schema in not enabled on this server and enabling it requires a reboot. Rebooting is not an option because doing so could corrupt the database while this query is processing.
As a means of gaining some sense for how long this will take I recreated the pIndexData table in a separate Aurora instance. I then imported a sample set of data (about 3 million records). The sample set had DateTime values spread out over 2021, 2022 and 2023, with the lions share of data in 2022. I then ran the same REORGANIZE PARTITION query and clocked the time it took to complete. The partition query took 2 minutes, 29 seconds. If the partition query to records was linear, I estimate the query on the original table should take roughly 18 hours. It seems there is no linear calculation. Even with a large margin of error, this is way off. Clearly, there are factors (perhaps many) I'm missing.
I'm not sure what else to try other than run the sample data test again but with an even larger data sample. Before I do, I'm hoping someone might have some insight how to best calculate how long this might take to finish.