-2

I am implementing data partitioning on mysql server(8.0).

As a "must-have requirement of partitioning" I have to add my partition key as a primary key, which is created_at in my table. eg:

alter table notifications drop primary key, add primary key(`id`, `created_at`);

In above query, id is an auto-increment primary key. I have more than 10M entries in table. Mysql won't allow live updates on schemas; it will lock the table when I run alter command.

At the same time, I can take a downtime for this process. Please guide the best approach to take this live.

I tried generating same amount of data on my local using "Faker" library & while running alter query on it. I observed it taking more than 2 hours for this process.

Schema

CREATE TABLE `data` (
  `seq_id` int NOT NULL AUTO_INCREMENT,
  `id` varchar(100) NOT NULL,
  `doc_id` varchar(100) NOT NULL,
  `page_no` int DEFAULT '1',
  `file_store` varchar(100) DEFAULT NULL,
  `s_id` varchar(100) NOT NULL,
  `s_f_id` varchar(100) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `f_store` varchar(100) DEFAULT 'GCP',
  `purged` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`seq_id`),
  UNIQUE KEY `id` (`id`),
  KEY `doc_id` (`doc_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `data `
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at))
(
  PARTITION `partition_2020` VALUES LESS THAN (UNIX_TIMESTAMP('2020-01-01 00:00:00')),
  PARTITION `partition_2021` VALUES LESS THAN (UNIX_TIMESTAMP('2021-01-01 00:00:00')), 
  PARTITION `partition_2022` VALUES LESS THAN (UNIX_TIMESTAMP('2022-01-01 00:00:00')), 
  PARTITION `partition_2022` VALUES LESS THAN (UNIX_TIMESTAMP('2023-05-31 00:00:00')),
  PARTITION `partition_future` VALUES LESS THAN (MAXVALUE)
);

Pankaj Arora
  • 10,224
  • 2
  • 37
  • 59
  • 1
    Use an online schema change tool. There are two good free options: [pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html) or [gh-ost](https://github.com/github/gh-ost). Like any new tool, test them out on data that is not production until you learn how to use them. – Bill Karwin Jun 26 '23 at 15:25
  • I am voting to reopen it, but I think it belongs in Database Adminstrators – Rohit Gupta Jul 01 '23 at 04:51
  • "In above query, id is an auto-increment primary key" -- this contradicts the table declaration. Please make them consistent. – Rick James Jul 06 '23 at 18:19
  • Will you be using Partitioning to do efficient `DROP PARTITION` based on `created_at`? Or some other reason? – Rick James Jul 06 '23 at 18:23
  • @RickJames yes, i'll be using Partitioning to do efficient DROP_PARTITION based on created at. Partition query also update in question. – Pankaj Arora Jul 11 '23 at 07:52

2 Answers2

2

If you are very concerned about the lock time, then I suggest you use the master-slave replication. Basically, it is a master-slave switching time, which is very short.

dogs Cute
  • 564
  • 3
  • 9
0

Both changing the PK and adding PARTITIONing require a full copy and re-index of the data. This will be slow. But... Let's talk about whether the Partitioning will provide any benefit. Please provide SHOW CREATE TABLE -- both currently and after adding partitioning. See also Partition

If, on the other hand, you decide that Partitioning is really needed, and the table is growing, then the sooner you do it, the less time it will take.

If you will be purging some old data as you do the conversion, then I suggest doing everything at once:

CREATE TABLE new_t (
    ... ); -- with new PK and partitioning.
INSERT INTO new_t
    SELECT * FROM t
        WHERE date >= ...; -- to purge old data by not copying it over

But even before doing that, let's see the CREATEs, there may be more advice to fold into the copy.

Also note that Partitioning does not allow for UNIQUE keys, nor for FOREIGN KEYs. We can discuss that further after seeing the CREATE.

Also -- Do you have a "natural" PK? That is, can you get rid of id?

More

While changing the PK, can you simply remove the seq_id (auto_inc) column? Or do you need it for some other reason?

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • i can't take downtime. – Pankaj Arora Jul 05 '23 at 17:11
  • And you have not set up Replication, either? Then let's focus on _why_ Partitioning should be added. – Rick James Jul 05 '23 at 21:20
  • Schema updated in question. – Pankaj Arora Jul 06 '23 at 06:00
  • **And...** When adding partitioning, you will also need to add `created_at` to UNIQUE KEY `id` (`id`) Notice that, by adding a column to both unique keys, the uniqueness constraint is destroyed. Do you actually need the uniqueness? For example, do you use `INSERT IGNORE` or catch "dup key" errors? – Rick James Jul 11 '23 at 19:30
  • yes, i need uniqueness. – Pankaj Arora Jul 14 '23 at 05:02
  • @PankajArora - With partitioning, `UNIQUE` may not provide the uniqueness constraint (because of having to add the PK on). You _could_ do the check in a separate check. Or this could be another argument for _not_ use Partitioning. – Rick James Jul 14 '23 at 15:28