7

I have a table which contains information about time, So the table has columns like year, month, day, hour and so on.
Table has data across years and quite big so i decided to make partition on this table and started learning about Mysql partitioning but caught up by few questions.
I will really appreciate, if someone can help me understand how partition and indexes will work together.

  1. If i create partition over year column and also have an index on the same column, how partition and index will work together? How it will impact the performance over, if i had index on year column only and table has no partition?

Ex. Sql: Select month, day, hour ... from time_table where year = '2017';

  1. If table has partition over year column and query is filtering records over month column and month column is indexed. How index over month and partition over year will impact the select performance.

Ex Sql: Select year, month, day .... from time_table where month = '05';

Manoj-kr
  • 776
  • 5
  • 18

1 Answers1

12

Partitioning splits a table up into, shall we say, "sub-tables". Each sub-table is essentially a table, with data and index(es).

When SELECTing from the table, the first thing done is to decide which partition(s) may contain the desired data. This is "partition pruning" and uses the "partition key" (which is apparently to be year). Then the select is applied to the subtables that are relevant, using whatever index is appropriate. In that case it is a waste to have INDEX(year, ...), since you are already pruned down to the year.

Your sample select cannot do partition pruning since you did not specify year in the WHERE clause. Hence, it will look in all partitions, and will be slower than if you did not partition the table.

  • Don't use partitioning unless you expect at least a million rows. (That would be a lot of years.)
  • Don't use partitioning unless you have a use case where it will help you. (Apparently not your case.)
  • Don't have columns for the parts of a datetime, when it is so easy to compute the parts: YEAR(date), MONTH(date), etc.
  • Don't index columns with low cardinality; the Optimizer will end up scanning the entire table anyway -- because it is faster. (eg: month='05')

If you would like to back up a step and explain what you are trying to accomplish, perhaps we can discuss another approach.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks for the detailed answer. Actually i am creating reports for the huge number of records. I have data stored in a `summary` table which has a relationship with the `time` table (i understand i can't have foreign key if use partition), so i can fetch the reports for the year or month of specific year or so. My `summary` table generates around 1 million records/year. As per your comment, in a partitioned table first it checks for the particular `partition` then use `index` in that part. Rather then checking for two things (`partition` and `index`), won't using `index` work faster ? – Manoj-kr Jul 11 '17 at 06:40
  • Correct. That is the usual argument that partitioning does not help with performance. The _summary_ table has a 1M/year? – Rick James Jul 11 '17 at 14:07
  • Yes, my summary table generates around 1 million records per year. – Manoj-kr Jul 11 '17 at 14:15
  • @Manoj-kr What did you end up doing exactly? I have the same use-case – systemdebt Dec 27 '20 at 17:51
  • @Simrankaur, I partitioned the `time` table by RANGE `year` and created subpartitions on `month` HASH as well. Also partitioned the `summary` tables by RANGE on referenced `time_id` values. – Manoj-kr Dec 28 '20 at 17:56
  • @Manoj-kr - But did it really speed things up? Was if faster than having no partitioning and suitable `PRIMARY KEY` and perhaps a secondary `INDEX`? Start a new Question showing the `CREATE TABLE` and the `SELECT`. Make it self-answered if you are happy with your solution. – Rick James Dec 28 '20 at 18:04