0

I am a student and I have a question when I research about mysql partition.

Example I have a table "Label" with 10 partitions by hash(TaskId)

resourceId (PK)
TaskId (PK)
...

And I have 10 table with name table is "label": + taskId:

tables: 
task1(resourceId,...)
task2(resourceId,...)
...

Could you please tell me about advantages and disadvantages between them? Thanks

huyna3
  • 25
  • 2
  • *by hash(TaskId)* Why not by value? *10 table with name table is "label" + taskId* means that you have 10 definite values which allows to create partition per value. – Akina Sep 07 '20 at 10:56
  • They are basically completely different things with their own evaluation criteria. Partitioning is something that happens on a technical level (like if you choose MyISAM or InnoDB), while creating 10 tables is something that has influence on/should be decided by your data model, relational logic and queries. Or, to express it in a different way: even if we would list 1000000 wonderful things about partitioning (we cannot), if your datamodel needs 10 tables, you cannot swap that fact with using partioning. – Solarflare Sep 07 '20 at 10:58
  • If your interest in this question is theoretical you should study the partitioning features in postgreSQL as well as some commercial rdbmss, as well as MySQL / Mariab. – O. Jones Sep 07 '20 at 11:50

1 Answers1

0

Welcome to Stack Overflow. I wish you had offered a third alternative in your question: "just one table with no partitions." That is by far, in almost all cases in the real world, the best way to handle your data. It only requires maintaining and querying one copy of each index, for example. If your data approaches billions of rows in size, it's time to consider stuff like partitions.

But never mind that. Your question was to compare ten tables against one table with ten partitions. Your ten-table approach is often called sharding your data.

First, here's what the two have in common: they both are represented by ten different tables on your storage device (ssd or disk). A query for a row of data that might be anywhere in the ten involves searching all ten, using whatever indexes or other techniques are available. Each of these ten tables consumes resources on your server: open file descriptors, RAM caches, etc.

Here are some differences:

  • When INSERTing a row into a partitioned table, MySQL figures out which partition to use. When you are using shards, your application must figure out which table to use and write the INSERT query for that particular table.
  • When querying a partitioned table for a few rows, MySQL automatically figures out from your query's WHERE conditions which partitions it must search. When you search your sharded data, on the other hand, your application much figure out which table or tables to search.
  • In the case you presented --partitioning by hash on the primary key -- the only way to get MySQL to search just one partition is to search for particular values of the PK. In your case this would be WHERE resourceId = foo AND TaskId = bar. If you search based on some other criterion -- WHERE customerId = something -- MySQL must search all the partitions. That takes time. In the sharding case, your application can use its own logic to figure out which tables to search.
  • If your system grows very large, you'll be able to move each shard to its own MySQL server running on its own hardware. Then, of course, your application will need to choose the correct server as well as the correct shard table for each access. This won't work with partitions.
  • With a partitioned table with an autoincrementing id value on each row inserted, each of your rows will have its own unique id no matter which partition it is in. In the sharding case, each table has its own sequence of autoincrementing ids. Rows from different tables will have duplicate ids.
  • The Data Definition Language (DDL: CREATE TABLE and the like) for partitioning is slightly simpler than for sharding. It's easier and less repetitive to write the DDL add a column or an index to a partitioned table than it is to a bunch of shard tables. With the volume of data that justifies sharding or partitioning, you will need to add and modify indexes to match the needs of your application in future.

Those are some practical differences. Pro tip don't partition and don't shard your data unless you have really good reasons to do so.

Keep in mind that server hardware, disk hardware, and the MySQL software are under active development. If it takes several years for your data to grow very large, new hardware and new software releases may improve fast enough in the meantime that you don't have to worry too much about partitioning / sharding.

O. Jones
  • 103,626
  • 17
  • 118
  • 172