3

I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any way to calculate how long this query might take?

The following is the query in question.

ALTER TABLE pIndexData REORGANIZE PARTITION pMAX INTO (
    PARTITION p2022 VALUES LESS THAN (UNIX_TIMESTAMP('2023-01-01 00:00:00 UTC')),
    PARTITION pMAX  VALUES LESS THAN (MAXVALUE) 
)

For context, the pIndexData table has about 6 billion records and the pMAX partition has roughly 2 billion records. This is an Amazon Aurora instance and the server is running MySQL 5.7.12. The DB Engine is InnoDB. The following is the table syntax.

CREATE TABLE `pIndexData` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `DateTime-UNIX` bigint(20) NOT NULL DEFAULT '0',
  `pkl_PPLT_00-PIndex` int(11) NOT NULL DEFAULT '0',
  `DataValue` decimal(14,4) NOT NULL DEFAULT '0.0000',
  PRIMARY KEY (`pkl_PPLT_00-PIndex`,`DateTime-UNIX`),
  KEY `id` (`id`),
  KEY `DateTime` (`DateTime-UNIX`) USING BTREE,
  KEY `pIndex` (`pkl_PPLT_00-PIndex`) USING BTREE,
  KEY `DataIndex` (`DataValue`),
  KEY `pIndex-Data` (`pkl_PPLT_00-PIndex`,`DataValue`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`DateTime-UNIX`)
(PARTITION p2016 VALUES LESS THAN (1483246800) ENGINE = InnoDB,
 PARTITION p2017 VALUES LESS THAN (1514782800) ENGINE = InnoDB,
 PARTITION p2018 VALUES LESS THAN (1546318800) ENGINE = InnoDB,
 PARTITION p2019 VALUES LESS THAN (1577854800) ENGINE = InnoDB,
 PARTITION p2020 VALUES LESS THAN (1609477200) ENGINE = InnoDB,
 PARTITION p2021 VALUES LESS THAN (1641013200) ENGINE = InnoDB,
 PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

In researching this question, I found using Performance Schema could provide the answer to my question. However, Performance Schema in not enabled on this server and enabling it requires a reboot. Rebooting is not an option because doing so could corrupt the database while this query is processing.

As a means of gaining some sense for how long this will take I recreated the pIndexData table in a separate Aurora instance. I then imported a sample set of data (about 3 million records). The sample set had DateTime values spread out over 2021, 2022 and 2023, with the lions share of data in 2022. I then ran the same REORGANIZE PARTITION query and clocked the time it took to complete. The partition query took 2 minutes, 29 seconds. If the partition query to records was linear, I estimate the query on the original table should take roughly 18 hours. It seems there is no linear calculation. Even with a large margin of error, this is way off. Clearly, there are factors (perhaps many) I'm missing.

I'm not sure what else to try other than run the sample data test again but with an even larger data sample. Before I do, I'm hoping someone might have some insight how to best calculate how long this might take to finish.

rootScott
  • 33
  • 2

1 Answers1

1

Adding (or removing) partitioning will necessarily copy all the data over and rebuild all the tables. So, if your table is large enough to warrant partitioning (over 1M rows), it will take a noticeable amount of time.

In the case of REORGANIZE one (or a few) partitions (eg, PMAX) "INTO ...", the metric is how many rows in the PMAX.

What you should have done is to create the LESS THAN 2022 late in 2021 when PMAX was empty.

Recommend you reorganize PMAX into 2022 and 2023 and PMAX now. Again, the time is proportional to the size of PMAX. Then be sure to create 2024 in Dec 2023, when PMAX is still empty.

What is the advantage of partitioning by Year? Will you be purging old data eventually? (That may be the only advantage.)

As for your test -- was there nothing in the other partitions when you measured 2m29s? That test would be about correct. There may be a small burden in adding the 2021 index rows.

A side note: The following is unnecessary since there are 2 other indexes handling it:

KEY `pIndex` (`pkl_PPLT_00-PIndex`) USING BTREE,

However, I don't know if dropping it would be "instant".

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Rick, thanks for looking into this. Yes, I'm fully aware this should have been done in Dec 2021. This will be the plan moving forward. I'm also aware I should have reorganized PMAX into 2022, 2023 and PMAX from the start. Hind sight is always 20/20. Of course, I'm stuck with the query I started. – rootScott Jan 18 '23 at 15:04
  • The advantage to partitioning by year is, like you said, being able to drop older years. I also noticed a performance increase while querying this table. Concerning the test, I spread out the data over 2021, 2022 and 2023, with the lions share of data in 2022. I wasn't sure if putting data in 2021 and 2023 would matter much. I just wanted the test to be as close the real thing as possible. – rootScott Jan 18 '23 at 16:22
  • @rootScott - Good. Could you share the query(s) that had a "performance increase". If my blog is incorrect in some situations, I would like to understand the issues and update the blog. – Rick James Jan 18 '23 at 16:36
  • The small test I ran does not seem to translate to the full size table. In my test, I calculated the number of record in the PMAX (2.8 million) / 149 secs = 106428.57 (the number of records reorganized per second). I then divided the 2 billion records in PMAX by 106428.57 to get an estimated 30 hrs. NOTE: This is a correction from an earlier estimate of 18 hrs, when I mistyped a number. The point being, the estimated 30 hrs is way off from the 15 days this process has been currently running. This is way I'm wondering what other factors I'm missing. – rootScott Jan 18 '23 at 16:47
  • Sorry for multiple comments; 600 character limit. You mention the "burden in adding the 2021 index rows". Does 2021 factor into this? I thought the 2021 partition / index would not impact this reorganization. Is this the missing factor? I'm thinking of trying the test again with a larger sample data set and compare the difference. This will at least tell me in the results are linear. Thanks for the note on dropping the pIndex key. I'll look into this once I'm past this hurtle. – rootScott Jan 18 '23 at 16:58
  • The performance testing was done about three years ago. Unfortunately I don't have any concrete test results to share. I remember a performance increase based on queries with larger date range in a give year. I don't know if the performance increase held true if I queried more then one year, ie more than one partition. The main reason for partitioning was / is to better manage large data over time. The noticed performance increase was a side benefit and was not extensively tested in a controlled environment. It's possible there was another factors influencing my test. I'll check out your blog – rootScott Jan 18 '23 at 17:33
  • @rootScott - When inserting a row, the 23 secondary indexes need updating. This action is actually delayed (cf "change buffering"), but it has to happen eventually. Since the 2021 index is already in place and moderately 'big', the updates will be _slightly_ slower than the 2022 updates, which are being added to an empty BTree. (Not a big deal, and unavoidable--other than decreasing the "23".) – Rick James Jan 18 '23 at 18:19
  • Some indexes will be updated in order (eg dates) and be slightly more efficient. At the other extreme are UUIDs and other hashes -- the randomness may lead to almost 1 disk read and 1 write per row inserted. That's with 16KB per read/write. 2B updates = 32TB if I/O. You won't have that much RAM (innodb_buffer_pool_size), therefore the process will be miserably I/O-bound. – Rick James Jan 18 '23 at 18:24
  • So, with that last note, the question about "linear" is close to being answered. Your small test may have been able to cache everything; the 2GB cannot. Hence, non-linear, but hard to predict how much slower. – Rick James Jan 18 '23 at 18:27
  • New thought... Is all the data "write-once"? That is, is nothing ever changed, merely appended to? The direction I am thinking: Spend weeks building a new table with corrected partitioning and indexes. Then switch over to the new table. – Rick James Jan 18 '23 at 18:31
  • Oops, did I invent the "23 indexes"? (I may be mixing your case up with another one I am answering.) – Rick James Jan 18 '23 at 18:38
  • Yeah, there are only 5 indexes in my table. I had the same thought about my test being too small and completing entirely in cache. My innodb_buffer_pool_size is 8446279680 bytes (7.866 GB). If you're saying 16KB of cache would be used per record, that would only take 515,520 records it fill the cache. My test was larger at 1.8 million but not significantly larger so I can see how this would change my calculation.. – rootScott Jan 18 '23 at 21:00
  • 50M _consecutive_ rows vs 0.5M _random_ rows. (By 'consecutive', I mean "in the order of the index".) The `PRIMARY KEY` determines the order of the data BTree. PMAX and two indexes are ordered by `pkl_PPLT_00-PIndex` plus something else. So, probably those are well cached. I can't tell about the other three, except to say that `id` and `datetime` are probably equally "consecutive" or "random". – Rick James Jan 18 '23 at 21:22
  • I would _guess_ that a test with PMAX containing 20M rows would run smoothly (by making good use of cache). But if it contained 100M rows, it would be well into some amount of thrashing -- that is, the ALTER would be running much slower "per row". Without understanding the distribution of the key values, I can't be more specific. This answers your implied question about 3M rows taking 2m29s -- It will _not_ scale up linearly. Another swag -- it may slow down by a factor of 10. – Rick James Jan 18 '23 at 21:30