1

I have a MySQL table say data_table

mysql> desc data_table;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| id         | int(11)          | NO   | PRI | NULL    | auto_increment |
| prod_id    | int(10) unsigned | NO   |     | NULL    |                |
| date       | date             | NO   |     | NULL    |                |
| cost       | double           | NO   |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

This table has around 700 million rows. I have created indexes on prod_id and date. I need to perform a query like this -

SELECT `id`, `prod_id`, WEEKOFYEAR(`date`) AS period, SUM(`cost`) AS cost_sum
FROM `data_table` GROUP BY `prod_id`, `period`;

My question is -

Will partitioning the table on months (~20 partitions) improve the performance of this query?

Optimus
  • 2,716
  • 4
  • 29
  • 49

3 Answers3

1

Based on the number of records and the SQL query you have written I would say yes, if done correctly Partitioning would help a lot. I would go further and suggest Range Partitioning on the Date field. This is a very common Partitioning method and works well and is easy to implement.

You don't mention the release of MySQL you're running so you'll have to do some additional reading HERE to understand what your MySQL release supports.

You can also run this SQL at the command prompt.

mysql> SHOW VARIABLES LIKE %partition%

This should report back with "have Partitioning = Yes" or "Partition_engine = yes" depending on your relase.

apesa
  • 12,163
  • 6
  • 38
  • 43
1

If you see that there are a lot of queries based on week number, it makes sense to permanently store the week number as a column. We can save on the calculation during select. The ideal strategy is to know what queries you will run and then design your tables accordingly.

ForeverLearner
  • 1,901
  • 2
  • 28
  • 51
  • I agree, In my case it makes a lot of sense to store `WEEKOFYEAR(date)` as a separate column and create an index on it... thanks.. – Optimus Oct 21 '15 at 23:21
1

PARTITIONing will not help at all. Not BY RANGE; not any other flavor.

The query must read every row in the table; partitioning does not change that fact, nor can it speed it up at all.

The query, as it stands, has an unrelated problem. Which id is it supposed to return for each GROUP? Answer: It will return a 'random' id.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks, I tried it out and It seems you are right. Query on partitioned table actually ran slower. Also, the query in the question is not a real query I am using, its just an example, actual query is much more meaningful. – Optimus Nov 03 '15 at 10:31
  • @Optimus -- and the actual query may run faster with partitioning (but I doubt it). Too many things factor into optimization; need to use the actual query. – Rick James Nov 03 '15 at 15:54