MySQL Partition By Both DATE and INT

Question

I have a table I want to partition using MySQL 5.7 Partitioning to mitigate issues I'm having with dropping old data quickly. (Also, it would be nice to have increased insert I/O performance by partitioning across something other than date, especially if I plan to shard across multiple volumes with subpartitions)

Here is a simplified version of the table:

CREATE TABLE `tbl` (
  `date` date NOT NULL,
  `sub_id` int(11) unsigned NOT NULL,
  `cmd_id` int(11) NOT NULL,
  `code`   TINYINT DEFAULT NULL,
  `rqst`   VARCHAR(32) NOT NULL DEFAULT '',
  UNIQUE KEY `uk1` (sub_id,cmd_id,date) 
) ENGINE=InnoDB

(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))

Here are the SQL statements I make on that table:

1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id

Here is the partitioning scheme I've devised so far:

PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
 PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
 PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
 PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
 PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
 PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
 PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
 PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
 PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
 PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
 PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)

However I believe this will hurt performance by requiring a read per each partition every time i reference (sub_id,cmd_id):

EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions                                                                                                                                     | type | possible_keys | key  | key_len | ref         | rows | Extra       |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
|  1 | SIMPLE      | optz  | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref  | uk1           | uk1  | 38      | const,const |   11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+

So the crux of my problem is:

If i partition by D date's then its D-1 Extra lookups
If i partition by S sub_id's then I cant easily DROP partitions on Date
I don't see how I could use COLUMNS Partitioning

Here are some notes/caveats:

INSERTing about 5-20million rows/day
Equal distribution of read,write,insert - but always single row
Only need to keep past ~month of data
A replication system is in place
The hardware involved is expensive
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.

Thanks!

Your comment says "month of data", but the partitions seem to be about 10*10 days. — Rick James, Nov 27 '15 at 18:05

Rick James · Accepted Answer · 2015-11-27T18:01:47.773

BY HASH is essentially useless, as are SUBPARTITIONs.

mitigate issues I'm having with dropping old data quickly.

That is, you need to DROP PARTITION for old date? Use PARTITION BY RANGE (TO_DAYS(date)) and don't bother with the subpartitioning.

For clarity, change UNIQUE KEY uk1 (sub_id,cmd_id,date) to PRIMARY KEY (sub_id,cmd_id,date).

[belated edited] Your three queries will work reasonably well with such. The SELECT and UPDATE will have to hit all partitions since date is not in the WHERE clause. The INSERT will hit only the latest partition (because of NOW()).

More discussion, including tips on doing the periodic purging: http://mysql.rjweb.org/doc.php/partitionmaint

Only need to keep past ~month of data

Recommend about 32 partitions -- one pending DROP, one future; see the link.

A replication system is in place

Doing the ALTER TABLE to add partitioning will stall the system, but I guess you understand the issue there.

I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.

Yeah, a necessary evil.

5-20million rows/day

That's a max of a few hundred per second? If you have ingestion speed problems, see http://mysql.rjweb.org/doc.php/staging_table

A compromise would be to keep about 38 days -- in weekly 6 weekly partitions. That would decrease the number partitions for `SELECT` and `UPDATE`, but increase the disk space needed, while still providing the efficiency of `DROP PARTITION`. — Rick James, Nov 27 '15 at 18:08
Thanks, this is helpful, I actually spent a couple nights reading through your website before i did this original post ;) I'm gonna update this comment with a question about your answer after i get a chance to read through a bit more. thanks again. — gfunk, Dec 10 '15 at 20:08

MySQL Partition By Both DATE and INT

1 Answers1