- Don't you want
PRIMARY KEY(id, moment)
or PRIMARY KEY(moment, id)
instead of INDEX(id)
?
- Indexes are partitioned. Each partition is essentially a "table". It has a `BTree for the data and PK, and a BTree for each secondary index.
- So, to find
id=123
requires checking INDEX(id)
in each partition. Herein lies one of the reasons why a PARTITIONed
table is sometimes slower than the equivalent non-partitioned table.
- It is inefficient to pre-create future partitions (other than one).
Show us the main queries you have. I will probably explain why you should not partition the table. I see two possible benefits in your definition:
- Dropping 'old' data is much faster than
DELETEing
it.
- `WHERE something-else AND moment between ..
Some cases
For this discussion, I assuming partitioning by a datetime in some fashion (BY RANGE(TO_DAYS(moment))
or BY ... (YEAR(moment))
, etc).
WHERE id BETWEEN 111 and 222
Partitioning probably hurts slightly because, regardless of what indexes are available, the query must look in every partition.
WHERE id BETWEEN 111 and 222
AND moment > NOW() - INTERVAL 1 MONTH
with some index starting with `id`
This is a case where partition "pruning" is beneficial. It will look in one or two partitions (depending on whether or not the query is being run in January). Then it will somewhat efficiently use the index to lookup by id
.
Now let be discuss two flavors if an index starting with id
(and assuming either of the WHERE
clauses, above:
PRIMARY KEY(id, moment)
The PK is "clustered" with the data. That is, the data is sorted by first id
then moment
. Hence the id BETWEEN...
will find the rows consecutively in the BTree -- this is the most efficient. The AND moment...
works to filter out some of the rows.
INDEX(id)
is not "clustered". It is a secondary index. Secondary indexes take two steps. (1) search the secondary BTree for the ids, but without filtering by moment
; (2) reach into the data BTree using the artificial PK that was provided for you; (3) now the filtering by moment
can happen. More steps, more blocks to read, etc.
DROP PARTITION p2020
id much faster and less invasive than `DELETE .. WHERE moment < '2021-01-01'.
More
It is important to look at all the main queries. X=constant
versus X BETWEEN...
can make a big difference in optimization; please provide concrete examples that are realistic for your app.
Also, sometimes a "covering" index can make up for otherwise inefficient indexes. So those examples need to show all the columns in the important queries. And what datatypes they are.
In the absence of such details, I will make the following broad statements (which might be invalidated by the specifics):
- If the
WHERE
references only one column, the PARTITIONing
is probably never beneficial.
- If the
WHERE
has one =
test and one 'range' test, there is probably a composite index that will work much better than partitioning.
- Partitioning may shine when there are two range tests, but only if 'pruning' can be applied. (There are a lot of limitations on pruning.)
- With 2 ranges, the one that is not being pruned on should be at the beginning of the
PRIMARY KEY
.
- When pruning is used but the rest of the
WHERE
cannot use some index, that implies a scan of the partition. If there are only a few partitions, that could be a big scan.
- Don't pre-build more than one partition. When not pruning, it is somewhat costly to open all the partitions only to find some are empty.