2

I have a table using InnoDB that stores all messages sent by my system. Currently the table have 40 million rows and grows 3/4 million per month.

My query is basically to select messages sent from an user and within a data range. Here is a simplistic create table:

CREATE TABLE `log` (
  `id` int(10) NOT NULL DEFAULT '0',
  `type` varchar(10) NOT NULL DEFAULT '',
  `timeLogged` int(11) NOT NULL DEFAULT '0',
  `orig` varchar(128) NOT NULL DEFAULT '',
  `rcpt` varchar(128) NOT NULL DEFAULT '',
  `user` int(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `timeLogged` (`timeLogged`),
  KEY `user` (`user`),
  KEY `user_timeLogged` (`user`,`timeLogged`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Note: I have individual indexes too because of other queries.

Query looks like this:

SELECT COUNT(*) FROM log WHERE timeLogged BETWEEN 1282878000 AND 1382878000 AND user = 20

The issue is that this query takes from 2 minutes to 10 minutes, depending of user and server load which is too much time to wait for a page to load. I have mysql cache enabled and cache in application, but the problem is that when user search for new ranges, it won't hit cache.

My question are:

  • Would changing user_timeLogged index make any difference?
  • Is this a problem with MySQL and big databases? I mean, does Oracle or other DBs also suffer from this problem?

AFAIK, my indexes are correctly created and this query shouldn't take so long.

Thanks for anyone who help!

joe.dawley
  • 466
  • 4
  • 14
Fernando
  • 4,459
  • 4
  • 26
  • 39
  • 2
    Post the output from the following `EXPLAIN SELECT COUNT(*) FROM log WHERE timeLogged BETWEEN 1282878000 AND 1382878000 AND user = 20;` – Jason McCreary Dec 01 '10 at 21:29
  • I'll post this as a comment b/c it doesn't address query-optimization, but have you considered an archiving strategy instead of keeping all messages in a single table? 40m records at a rate of 750k/month implies over four years' worth of data. Unless it is truly the case that messages at any arbitrary age are queried with equal frequency, you may want to consider moving old messages to a separate table, and implementing logic that directs requests for old messages to that table. – Dan J Dec 01 '10 at 21:43
  • There isn't much to optimize in the query. What size is your *key_buffer*? – AndreKR Dec 01 '10 at 21:49
  • 1
    You mean `innodb_buffer_pool_size` since it is InnoDB table. – johno Dec 01 '10 at 22:09

2 Answers2

1

you're using innodb but not taking full advantage of your innodb clustered index (primary key) as it looks like your typical query is of the form:

select <fields> from <table> where user_id = x and <datefield> between y and z

not

select <fields> from <table> where id = x 

the following article should help you optimise your table design for your query.

http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

If you understand the article correctly you should find youself with something like the following:

drop table if exists user_log;
create table user_log
(
user_id int unsigned not null,
created_date datetime not null, 
log_type_id tinyint unsigned not null default 0, -- 1 byte vs varchar(10)
...
...
primary key (user_id, created_date, log_type_id)
)
engine=innodb;

Here's some query performance stats from the above design:

Counts

select count(*) as counter from user_log

counter
=======
37770394

select count(*) as counter from user_log where 
 created_date between '2010-09-01 00:00:00' and '2010-11-30 00:00:00'

counter
=======
35547897

User and date based queries (all queries run with cold buffers)

select count(*) as counter from user_log where user_id = 4755

counter
=======
7624

runtime = 0.215 secs


select count(*) as counter from user_log where 
 user_id = 4755 and created_date between '2010-09-01 00:00:00' and '2010-11-30 00:00:00'

counter
=======
7404

runtime = 0.015 secs

select 
 user_id,
 created_date,
 count(*) as counter
from 
 user_log 
where 
 user_id = 4755 and created_date between '2010-09-01 00:00:00' and '2010-11-30 00:00:00'
group by
 user_id, created_date
order by
 counter desc
limit 10;

runtime = 0.031 secs

Hope this helps :)

Jon Black
  • 16,223
  • 5
  • 43
  • 42
0

COUNT(*) is not loading from the table cache because you have a WHERE clause, using EXPLAIN as @jason mentioned, try changing it to COUNT(id) and see if that helps.

I could be wrong, but I also think that your indexes have to be in the same order as your WHERE clause. Since your WHERE clause uses timeLogged before user then your index should be KEYuser_timeLogged(timeLogged,user)`

Again, EXPLAIN will tell you whether this index change makes a difference.

Ben
  • 60,438
  • 111
  • 314
  • 488
  • 2
    Actually it makes no difference in which order are constraints in same where clause written. Its the query optimizer job to handle that. timelogged is a range scan so it will be used as last part. Indexes seem ok, maybe he is selecting too much from table (resulting into a full table scan) or db configuration is wrong. – johno Dec 01 '10 at 22:08