0

I have the following table in MySQL:

CREATE TABLE tweetdb(
       tweetid BIGINT(18) UNSIGNED NOT NULL, 
       userid INT(10) UNSIGNED NOT NULL, 
       timestamp CHAR(14), 
       tweet TEXT, 
       score TINYINT, 
  PRIMARY KEY(tweetid, userid)
) ENGINE=MYISAM PARTITION BY KEY(userid) PARTITIONS 101;

+-----------+---------------------+------+-----+---------+-------+
| Field     | Type                | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| tweetid   | bigint(18) unsigned | NO   | PRI | NULL    |       |
| userid    | int(10) unsigned    | NO   | PRI | NULL    |       |
| timestamp | char(14)            | YES  |     | NULL    |       |
| tweet     | text                | YES  |     | NULL    |       |
| score     | tinyint(4)          | YES  |     | NULL    |       |
+-----------+---------------------+------+-----+---------+-------+
5 rows in set (0.29 sec)

I have 210 million rows in this table. My Undertow server (java application) sends a GET with the following select query:

"SELECT test.tweetdb.tweetid, test.tweetdb.tweet, test.tweetdb.score FROM test.tweetdb WHERE test.tweetdb.userid = 287543000 AND test.tweetdb.timestamp = 20140420000829;"

I use the userid and timestamp to get the results as it is only data I have available to test the database. The database is for read only purposes, with no writes / updates.

I have also used an index on the table.

mysql> SHOW INDEX FROM tweetdb;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tweetdb |          1 | id_index |            1 | userid      | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |               |
| tweetdb |          1 | id_index |            2 | timestamp   | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

Now, even after using partitioning and applying a primary key, it takes almost 1 second to respond back with a correct response, which is very long. My application must have a throughput of atleast 6000 requests per second.

Hardware configurations:

I am running an Undertow server (front end) to query the Mysql server (backend) on an Amazon M1.large instance. To avoid latency, I am running both servers on the same instance.

Can anyone help me out? I am running out ideas. Thank you!

Updates

mysql> EXPLAIN SELECT * FROM test.tweetdb LIMIT 1;
+----+-------------+---------+------+---------------+------+---------+------+-----------+-------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows      | Extra |
+----+-------------+---------+------+---------------+------+---------+------+-----------+-------+
|  1 | SIMPLE      | tweetdb | ALL  | NULL          | NULL | NULL    | NULL | 270119913 |       |
+----+-------------+---------+------+---------------+------+---------+------+-----------+-------+
1 row in set (3.67 sec)


mysql> EXPLAIN SELECT * FROM test.tweetdb WHERE test.tweetdb.userid=287543000 AND test.tweetdb.timestamp=20140420000829;
+----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | tweetdb | ALL  | NULL          | NULL | NULL    | NULL | 2657601 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

Time from Undertow frontend server

The time it takes is 1.3 seconds

AngryPanda
  • 1,261
  • 2
  • 19
  • 42

1 Answers1

0

Your primary key is combination of tweetid and userid. And for mysql it is going for full search because your table is having primary key of combile column. You can create another key having only userid. For mysql if You are having two columns in key then they should be present in where otherwise it considers it for whole table search

Deepak
  • 47
  • 4
  • In my dataset, userid and timestamp combination is not unique. A twitterbot can create multiple tweets at the same time. I wanted to create a primary key on tweetid, userid and timestamp but then it takes very long to load the data into the table. Do you recommend I drop the primary key all together? – AngryPanda Mar 31 '15 at 10:57