14

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower.

I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of partitioning my tables,

My current Db structure is like

CREATE TABLE `user` (
  `id` BIGINT(20) NOT NULL,
  `name` VARCHAR(255) DEFAULT NULL,
  `email` VARCHAR(255) DEFAULT NULL,
  `location_id` bigint(20) default NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `FK3DC99772C476E06B` (`location_id`),
  CONSTRAINT `FK3DC99772C476E06B` FOREIGN KEY (`location_id`) REFERENCES `places` (`id`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8


CREATE TABLE `friends` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `user_id` BIGINT(20) DEFAULT NULL,
  `friend_id` BIGINT(20) DEFAULT NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `unique_friend` (`user_id`,`friend_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8

Now I am testing how to better use partitioning, for user table following I thought will be good based on by usage.

CREATE TABLE `user_partition` (
  `id` BIGINT(20) NOT NULL,
  `name` VARCHAR(255) DEFAULT NULL,
  `email` VARCHAR(255) DEFAULT NULL,
  `location_id` bigint(20) default NULL,
  `updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `FK3DC99772C476E06B` (`location_id`) 
) ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY HASH(id DIV 100000)
PARTITIONS 30;

I created a procedures to load data in two table and check the performance of the two tables

DELIMITER //
CREATE PROCEDURE load_partition_table()
BEGIN
DECLARE v INT DEFAULT 0;
    WHILE v < 1000000
    DO
    INSERT INTO user_partition (id,NAME,email)
    VALUES (v,CONCAT(v,' name'),CONCAT(v,'@yahoo.com')),
    (v+1,CONCAT(v+1,' name'),CONCAT(v+1,'@yahoo.com')),
    (v+2,CONCAT(v+2,' name'),CONCAT(v+2,'@yahoo.com')),
    (v+3,CONCAT(v+3,' name'),CONCAT(v+3,'@yahoo.com')),
    (v+4,CONCAT(v+4,' name'),CONCAT(v+4,'@yahoo.com')),
    (v+5,CONCAT(v+5,' name'),CONCAT(v+5,'@yahoo.com')),
    (v+6,CONCAT(v+6,' name'),CONCAT(v+6,'@yahoo.com')),
    (v+7,CONCAT(v+7,' name'),CONCAT(v+7,'@yahoo.com')),
    (v+8,CONCAT(v+8,' name'),CONCAT(v+8,'@yahoo.com')),
    (v+9,CONCAT(v+9,' name'),CONCAT(v+9,'@yahoo.com'))
    ;
    SET v = v + 10;
    END WHILE;
    END
    //

CREATE PROCEDURE load_table()
BEGIN
DECLARE v INT DEFAULT 0;
    WHILE v < 1000000
    DO
    INSERT INTO user (id,NAME,email)
    VALUES (v,CONCAT(v,' name'),CONCAT(v,'@yahoo.com')),
    (v+1,CONCAT(v+1,' name'),CONCAT(v+1,'@yahoo.com')),
    (v+2,CONCAT(v+2,' name'),CONCAT(v+2,'@yahoo.com')),
    (v+3,CONCAT(v+3,' name'),CONCAT(v+3,'@yahoo.com')),
    (v+4,CONCAT(v+4,' name'),CONCAT(v+4,'@yahoo.com')),
    (v+5,CONCAT(v+5,' name'),CONCAT(v+5,'@yahoo.com')),
    (v+6,CONCAT(v+6,' name'),CONCAT(v+6,'@yahoo.com')),
    (v+7,CONCAT(v+7,' name'),CONCAT(v+7,'@yahoo.com')),
    (v+8,CONCAT(v+8,' name'),CONCAT(v+8,'@yahoo.com')),
    (v+9,CONCAT(v+9,' name'),CONCAT(v+9,'@yahoo.com'))
    ;
    SET v = v + 10;
    END WHILE;
    END
    //

Results were surprizing, insert/select in non partition table giving better results.

mysql> select count(*) from user_partition;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.40 sec)

mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.00 sec)


mysql> call load_table();
Query OK, 10 rows affected (20.31 sec)

mysql> call load_partition_table();
Query OK, 10 rows affected (21.22 sec)

mysql> select * from user where id = 999999;
+--------+-------------+------------------+---------------------+
| id     | name        | email            | updated_time        |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999@yahoo.com | 2012-11-27 08:06:54 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)

mysql> select * from user_no_part where id = 999999;
+--------+-------------+------------------+---------------------+
| id     | name        | email            | updated_time        |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999@yahoo.com | 2012-11-27 08:03:14 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)

So two question

1) Whats the best way to partition user table so that inserts and selects also become fast and removing FOREIGN KEY on location_id is correct? I know partition can be good only if we access on the base of partition key, In my case I want to read the table only by id. why inserts are slower in partition table?

2) What the best way to partition friend table as I want to partition friends on the bases of user_id as want to place all user friends in same partition and always access it using a user_id. Should I drop the primary key on friend.id or add the user_id in primary key?

maaz
  • 4,371
  • 2
  • 30
  • 48
  • The post doesn't really show any appreciable performance difference; a single second difference of inserting a million records is pretty insignificant (~ 0.5%). – Ja͢ck Dec 09 '12 at 05:29
  • AFAIK to speed up SELECT's, [indexing](http://dev.mysql.com/doc/refman/5.5/en/mysql-indexes.html) is the proper solution. There is [some advice on INSERT speed-up](http://dev.mysql.com/doc/refman/5.5/en/insert-speed.html), too, the main line being to do bulk inserts. – ivan_pozdeev Dec 09 '12 at 11:13

3 Answers3

4

First I would recommend if possible that you upgrade to 5.6.5 or later of Mysql to ensure you are taking advantage of partitioning properly and with best performance. This is not always possible due to GA concerns, but my experience is that there was a difference in performance between 5.5 and 5.6, and 5.6 offers some other types of partitioning.

1) My experience is that inserts and updates ARE faster on partitioned sets as well as selects AS LONG AS YOU ARE INCLUDING THE COLUMN THAT YOU ARE PARTITIONING ON IN THE QUERY. If I ask for a count of all records across all partitions, I see slower responses. That is to be expected because the partitions are functioning LIKE separate tables, so if you have 30 partitions it is like reading 30 tables and not just one.

You must include the value you are partitioning on in the primary key AND it must remain stable during the life of the record.

2) I would include user_id and id in the primary key - assuming that your friends tables user_id and id do not change at all once the record is established (i.e. any change would be a delete/insert). In my case it was "redundant" but more than worth the access. Whether you choose user_id/id or id/user_id depends on your most frequent access.

A final note. I tried to create LOTS of partitions when I first started breaking my data into partitions, and found that just a few seemed to hit the sweet spot - 6-12 partitions seemed to work best for me. YMMV.

TJChambers
  • 1,489
  • 1
  • 18
  • 28
  • Thanks but then why inserts in partition table more time in my test case am I doing something wrong. I usually select friends based on some user_id and never used an id field – maaz Dec 06 '12 at 10:39
  • I would have to see what your insert looks like, but I concur with you it should take less time, not more time. I assume that you have tried these on latest MySQL version? – TJChambers Dec 06 '12 at 16:31
  • mysql version 5.5 I have used – maaz Dec 07 '12 at 05:18
  • In 5.5, "pruning" was not done on `INSERT`, hence the sluggishness. – Rick James Nov 26 '15 at 05:25
1

1. Use this sql query to select table and excepting all column, except id:

I answer what you need:

I suggest you to remove FOREIGN KEY and PRIMARY KEY

I know this is crazy, but they can ask computer to know what the current id, last id, next id and this wlll take long than create id manually. other way you can create int id manually by java .

use this sql query to insert fastly:

INSERT INTO user (id,NAME,email)
VALUES ('CREATE ID WITH JAVA', 'NAME', 'EMAIL@YAHOO.COM')

I can't decide my query can work faster or not...

Because all depend on your computer performance, make sure you use it on server, because server can finish all tasks fastly.

and for select, in page where profile info located you will need one row for one user that defined in profile id.

use mysql limit if you only need one and if you need more than one ... Just change the limit values like this for one row:

select * from user where id = 999999 limit 1;

and for seven row:

select * from user where id = 999999 limit 7;

I think this query will work faster than without limit and remember limit can work with insert too

2. For friend partition: the answer is drop the primary key

Table with no primary key is no problem

Once again, create the id with java... java designed to be faster in interface and your code include while and java can do it. For example you need to retrieve your all friend data ... use this query to perform faster:

select fr.friend_id, usr.* from friends as fr INNER JOIN user as usr 
ON dr.friend_id = usr.id
where fr.user_id = 999999 LIMIT 10;

and i think this is enough sorry i can only explain about mysql and not in java. Because, i'm not expert in java but i understand about it

Michael Antonius
  • 942
  • 2
  • 11
  • 20
  • defiantly it help but it did not answer completely, can you update your answer for 1 and 2. – maaz Dec 07 '12 at 09:35
  • First what programming language you have used ? – Michael Antonius Dec 07 '12 at 09:53
  • java but you can use any language as it is a mysql question – maaz Dec 07 '12 at 11:06
  • can you update the answer 1 &2 so that I can mark answer and give you the bounty – maaz Dec 07 '12 at 12:17
  • 3
    drop primary_key?? this is some kind of heresy. Without index selects will be drastically slower and if You intend to generate id's in Java it may happen that 2 users will try to insert rows with same id, mysql keep value for new id all time in memory so it will be fast enough to leave this task for dbms. – Gustek Dec 09 '12 at 05:45
  • Dropping the `AUTO_INCREMENT` PK on `friends` will promote the `UNIQUE` index to be the PK. This is much more efficient, both for `INSERTs` and `SELECTs`. `INSERT`: Your code has 2 BTrees to update on `INSERT`; without `id`, there is only 1. `SELECT`: fetching via a secondary index requires then reaching for the PK. – Rick James Nov 26 '15 at 05:31
0

1) If You use always(or mostly) only id to select data it is obvious to use this field as base for partitioning condition. As it is number there is no need for hash function simply use range partitioning. How many partitions to create(what numbers to choose as borders) you need to find by Yourself but as @TJChambers mentioned before around 8-10 should be efficient enough.

Insert are slower because You test it wrong. You simply insert 1000000 rows one after another without any randomness and the only difference is that for partitioned table mysql needs to calculate hash which is extra time. But as in Your case id is base of condition for partitioning You will never gain anything with inserting as all new rows go on the end of table.

If You had for example table with GPS localizations and partitioned it by lat and lon You could see difference in inserting if for example each partition was different continent. And difference would be seen if You had a table with some random(real) data and were inserting some random values not linear.

Your select for partitioned table is slower because again You test it wrong.

@TJChambers wrote before me about it, Your query needs to work on all partitions(it is like working with many tables) so it extends time. Try to use where to work with data from just one partition to see a difference.

for example run:

select count(*) from user_partition where id<99999;

and

select count(*) from user where id<99999;

You will see a difference.

2) This one is hard. There is no way to partition it without redundancy of data(at least no idea coming to my mind) but if time of access (select speed) is the most important the best way may be to partition it same way as user table (range on one of the id's) and insert 2 rows for each relationship it is (a,b) and (b,a). It will double number of rows but if You partition in to more than 4 parts you will work on less records per query anyway and You will have just one condition to check no need for or.

I tested it with with this schema

CREATE TABLE `test`.`friends` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` ),
INDEX ( `b` )
) ENGINE = InnoDB;

CREATE TABLE `test`.`friends_part` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` , `b` )
) ENGINE = InnoDB
PARTITION BY RANGE (a) (
    PARTITION p0 VALUES LESS THAN (1000),
    PARTITION p1 VALUES LESS THAN (2000),
    PARTITION p2 VALUES LESS THAN (3000),
    PARTITION p3 VALUES LESS THAN (4000),
    PARTITION p4 VALUES LESS THAN (5000),
    PARTITION p5 VALUES LESS THAN (6000),
    PARTITION p6 VALUES LESS THAN (7000),
    PARTITION p7 VALUES LESS THAN (8000),
    PARTITION p8 VALUES LESS THAN (9000),
    PARTITION p9 VALUES LESS THAN MAXVALUE
);

delimiter //
DROP procedure IF EXISTS fill_friends//
create procedure fill_friends()
begin
    declare i int default 0;
    declare a int;
    declare b int;
    while i<2000000
    do
    set a = rand()*10000;
    set b = rand()*10000;
    insert into friends values(a,b);
    set i = i + 1;
    end while;
end
//
delimiter ;

delimiter //
DROP procedure IF EXISTS fill_friends_part//
create procedure fill_friends_part()
begin
    insert into friends_part (select a,b from friends);
    insert into friends_part (select b as a, a as b from friends);
end
//
delimiter ;

Queries I have run are:

select * from friends where a=317 or b=317;

result set: 475 times: 1.43, 0.02, 0.01

select * from friends_part where a=317;

result set: 475 times: 0.10, 0.00, 0.00

select * from friends where a=4887 or b=4887;

result set: 483 times: 1.33, 0.01, 0.01

select * from friends_part where a=4887;

result set: 483 times: 0.06, 0.01, 0.00

I didn't bother about uniqueness of data but in your example You may use unique index. As well I used InnoDB engine, but MyISAM is better if most of the queries are select and you are not going to do many writes. There is no big difference for 2nd and 3rd run probably because of caching, but there is visible difference for 1st run. It is faster because we are breaking one of prime rules of database designing, but the end justifies the means so it may be good solution for really big tables. If you are going to have less than 1M of records I think You can survive without partitioning.

Gustek
  • 3,680
  • 2
  • 22
  • 36