2

(edited) For more details about the app it self, please, also see: Simple but heavy application consuming a lot of resources. How to Optimize? (The adopted solution was use both joins and fulltext search)

I have the following query running up to roughly 500.000 rows in 25 seconds. If I remove the ORDER, it takes 0.5 seconds.

Fisrt test

Keeping the ORDER and removing all t. and tu. columns, the query takes 7 seconds.

Second test

If I add or remove an INDEX to the i.created_at field the response time remain the same.

QUERY:

**EDITED: I'VE NOTICED THAT BOTH GROUP BY AND ORDER BY SLOW DOWN THE QUERY (I've also achieve a little gain in the query changing the joins. The gain was to 10secs, but at all, the problem remains). With the modification, the EXPLAIN have stopped to return filesort, but stills returning "using temporary" **

SELECT SQL_NO_CACHE
        DISTINCT `i`.`id`, 
        `i`.`entity`, 
        `i`.`created_at`, 
        `i`.`collected_at`, 

        `t`.`status_id` AS  `twt_status_id`, 
        `t`.`user_id` AS `twt_user_id`, 
        `t`.`content` AS `twt_content`, 
        `tu`.`id` AS `twtu_id`, 
        `tu`.`screen_name` AS `twtu_screen_name`, 
        `tu`.`profile_image` AS `twtu_profile_image`


        FROM `mtrt_items` AS `i`

        LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id

        LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
        LEFT JOIN `twt_users` AS `tu` ON u.id = tu.id

        INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
        INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
        INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
        INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
        INNER JOIN `account_clients` AS `c` ON g.client_id =c.id                

    ORDER BY `i`.`created_at` DESC 
    LIMIT 100 OFFSET 0

Here is the EXPLAIN (EDITED):

+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| id | select_type | table | type   | possible_keys      | key       | key_len | ref                    | rows | Extra                        |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
|  1 | SIMPLE      | c     | index  | PRIMARY            | PRIMARY   | 4       | NULL                   |    1 | Using index; Using temporary |
|  1 | SIMPLE      | g     | ref    | PRIMARY,client_id  | client_id | 4       | clubr_new.c.id         |    3 | Using index                  |
|  1 | SIMPLE      | sg    | ref    | group_id,search_id | group_id  | 4       | clubr_new.g.id         |    1 | Using index                  |
|  1 | SIMPLE      | s     | eq_ref | PRIMARY            | PRIMARY   | 4       | clubr_new.sg.search_id |    1 | Using index                  |
|  1 | SIMPLE      | r     | ref    | search_id,item_id  | search_id | 4       | clubr_new.s.id         | 4359 | Using where                  |
|  1 | SIMPLE      | i     | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.r.item_id    |    1 |                              |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.i.user_id    |    1 | Using index                  |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY            | PRIMARY   | 4       | clubr_new.i.id         |    1 |                              |
|  1 | SIMPLE      | tu    | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.u.id         |    1 |                              |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+

Here is the mtrt_items table:

+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| Field        | Type                                                  | Null | Key | Default | Extra          |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| id           | bigint(20)                                            | NO   | PRI | NULL    | auto_increment |
| entity       | enum('twitter','facebook','youtube','flickr','orkut') | NO   | MUL | NULL    |                |
| user_id      | bigint(20)                                            | NO   | MUL | NULL    |                |
| created_at   | datetime                                              | NO   | MUL | NULL    |                |
| collected_at | datetime                                              | NO   |     | NULL    |                |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+

 CREATE TABLE `mtrt_items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `entity` enum('twitter','facebook','youtube','flickr','orkut') COLLATE utf8_unicode_ci NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `created_at` datetime NOT NULL,
  `collected_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `mtrt_user_id` (`user_id`),
  KEY `entity` (`entity`),
  KEY `created_at` (`created_at`),
  CONSTRAINT `mtrt_items_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `mtrt_users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=309650 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The twt_tweets_content is MyISAM and is also used for fulltext searches:

+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id        | int(11)      | NO   | PRI | NULL    |       |
| user_id   | int(11)      | NO   | MUL | NULL    |       |
| status_id | varchar(100) | NO   | MUL | NULL    |       |
| content   | varchar(200) | NO   | MUL | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
Community
  • 1
  • 1
Keyne Viana
  • 6,194
  • 2
  • 24
  • 55
  • Just to be sure, you're doing RESET QUERY CACHE before each benchmark. right? – dfb Aug 11 '11 at 02:20
  • what happens when you try `ORDER BY `i`.`id` DESC` ? Generally it would give you the exact same result. id is auto increment, so an older id means that the record was created further in the past. – Sabeen Malik Aug 11 '11 at 02:21
  • @spinning_plate Actually, I use `SELECT SQL_NO_CACHE...`. – Keyne Viana Aug 11 '11 at 02:22
  • @Sabeen-Malik The same time actually, it's strange. I've notice that using the EXPLAIN on phpmyadmin, the first row of the result posted in this question has the following: `Using index; Using temporary; Using filesort` not just `using index`. Do you know why? – Keyne Viana Aug 11 '11 at 02:28
  • Stupid idea, but if you wrap the whole thing in `SELECT * FROM (your query without ordering) alias ORDER BY alias.id` do you get the 0.05 second run time? – Dan Grossman Aug 11 '11 at 02:33
  • I am not sure about the table structure here. Why do we see `MUL` in three columns? Can you also post the output of `show create table mtrt_items` – Sabeen Malik Aug 11 '11 at 02:36
  • @Dan-Grossman Same time... thanks any way – Keyne Viana Aug 11 '11 at 02:36
  • @Sabeen-Malik Please, check again. Edited. – Keyne Viana Aug 11 '11 at 02:40
  • Not a stupid idea Dan... this is a common problem as the parser will want to place the order as close to the table selection as it can unless you deliberately make it perform otherwise. – Michael Hays Aug 11 '11 at 02:47
  • @Sabeen-Malik Check the explain again and see the extra column: Using index; Using temporary; Using filesort – Keyne Viana Aug 11 '11 at 03:12

2 Answers2

6

Instead of placing the Order By into the main query, wrap it, like so:

SELECT * FROM (   
  ... your query
) ORDER BY `created at`

Take a look at the query plan. You will find that in your case, the sort is performed on your table mtrt_items before the outer join is performed. In the rewrite I've partially provided, the sort is applied after the outer joins, and is applied on a much smaller set.

UPDATE

Assuming that the LIMIT is being applied to a large set (500,000?), it looks like you can perform the top before doing any of the joins.

SELECT * from (
    SELECT 
    `id`, ... `created_at`, ...
    ORDER BY `i`.`created_at` DESC 
    LIMIT 100 OFFSET 0) as i

    LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id

    LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
    LEFT JOIN `twt_users` AS `tu` ON t.user_id = tu.id

    INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
    INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
    INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
    INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
    INNER JOIN `account_clients` AS `c` ON g.client_id =c.id                

GROUP BY i.id
Michael Hays
  • 6,878
  • 2
  • 21
  • 17
  • As I said in the other comment... same time. – Keyne Viana Aug 11 '11 at 02:52
  • Also, check that the explain returns "Using index; Using temporary; Using filesort". Maybe the problem? – Keyne Viana Aug 11 '11 at 03:13
  • All filesort means is that it's sorting. It's a bad name. It's effectively a quicksort. What's irritating about this is that, if it is as you say, then the sort is really working to sort a huge set from which it will only return 100 items. If the table from which it pulls 100 items is, say, 500,000 items, then this is why everything slows down on you. – Michael Hays Aug 11 '11 at 04:02
  • Not sure if I understood you. But in this case I have result of ~300k of items and ~150k of users, totalizing ~450k. – Keyne Viana Aug 11 '11 at 04:10
  • Unfortunately, the large set is from one test account, it will have more than one, each one with approximately ~100k+ per month. Then, the large set will be ~5M+ in just one semester. – Keyne Viana Aug 11 '11 at 04:40
  • I've tested your solution and it worked in the test database with just one account less than 0.5 secs. Thanks anyway. I will start again from here. If you have any ideas on how to do the same with more the one account with the same amount of items, I will appreciate. – Keyne Viana Aug 11 '11 at 04:52
  • If you find yourself at the end of your rope, and don't mind replicating your database for me, drop me an email and I can work with you. Otherwise, best of luck. I'm very curious with what you find out. – Michael Hays Aug 11 '11 at 17:03
  • Thank you very much for your time. I'd appreciate your opinion over my current database model and queries performance. Sure you can send your e-mail. – Keyne Viana Aug 13 '11 at 01:32
  • @MichaelHays To `ORDER BY` *before* joining to the other tables is a genius solution. Thank you so much for helping me reduce query execution time from 1,4 seconds to 0,0002 sec! – silkfire Sep 18 '13 at 08:42
0

Don't include the VARCHAR/TEXT fields in your initial query. This will create the TEMPORARY table required for the sorting, using the MEMORY engine and this will increase the efficiency dramatically. You can collect the text fields later using another query, without any sorting, simply with a condition on the PRIMARY KEY field and merge the data in your script (assuming that you are using one).
Also get rid of any JOINs (INNER or OUTER) that you don't actually take any data from.

georgepsarakis
  • 1,927
  • 3
  • 20
  • 24