1

We want to store a very large table in MySQL. The workload's spec is like this:

  1. Around 1 billion rows.
  2. Each row has just two fields: a long int as the key and a blob with average size of 20KB as the value.
  3. High rate of inserts, around 1000 per second and about 40% are updates. Several reads per second.

We have a single server with 2x4-core Xeon CPUs, 80 GB of RAM, and 6x3TB hard disks with RAID-0. What is the best schema design to get the best performance? For examples there are several options:

  1. Using the long id as the key.
  2. Adding an auto increment field as the PK and indexing the long id.
  3. Partitioning the table.
  4. Choosing the Dynamic row format.

Another single machine and free solutions are also welcome e.g. Postgres. We cannot accept distributed solutions like HBase and Cassandra because we want to spare just a single machine. However, if we can get a solution which needs 3 machines and reaches the same performance and also offers fault tolerance we can accept it too.

Saeed Shahrivari
  • 815
  • 1
  • 9
  • 16
  • 1
    If that `long int` really is unique I don't see any benefit in adding _another_ unique column. –  Jul 21 '14 at 08:33
  • 2
    Consider that updating an index will often use a lot of time, but will save time when reading. Balance this. – deceze Jul 21 '14 at 08:36
  • Ist redis a solution designed to do just that? – ToBe Jul 21 '14 at 09:27
  • 3
    20 KB * 1 billion = 20TB, but your local storage is only 3*6=18TB. Any database has considerable overhead, so you should prepare array with 30-40TB free space available. Is your payload highly compressible? – mvp Jul 21 '14 at 10:04
  • @a_horse_with_no_name adding an auto increment id can speedup insertion but slows lookups. – Saeed Shahrivari Jul 21 '14 at 20:15
  • @ToBe the amount of data is too big to fit in the main memory. – Saeed Shahrivari Jul 21 '14 at 20:16
  • @a_horse_with_no_name the data is already compressed. – Saeed Shahrivari Jul 21 '14 at 20:16
  • Why would adding *another* unique column and therefore *another* index speed up anything? If you define your unique column as the PK you will have just the same thing as if you added an auto-increment. The lookup based on a unique index is always equally fast - regardless how the indexed column was populated. –  Jul 21 '14 at 20:26
  • @a_horse_with_no_name random inserts on PK are very slower than incremental inserts because PK is always clustered but unique indices are not clustered in MySQL. Therefor, an auto increment PK speedups insertions. – Saeed Shahrivari Jul 21 '14 at 21:06
  • As I said: just declare your unique column as the PK (if it is *really* unique). If you are maintaining *two* unique indexes things will definitely be slower than having just one index - regardless if they are clustered or not –  Jul 21 '14 at 22:09
  • @a_horse_with_no_name no, test it yourself and see. Insertions are slower on clustered index. – Saeed Shahrivari Jul 22 '14 at 05:45

0 Answers0