0

Schema is like this visits_table:

+---------------------------+----------------------+------+-----+---------+----------------+
| Field                     | Type                 | Null | Key | Default | Extra          |
+---------------------------+----------------------+------+-----+---------+----------------+
| idvisit                   | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| idsite                    | int(10) unsigned     | NO   | MUL | NULL    |                |
| idvisitor                 | binary(8)            | NO   |     | NULL    |                |
| visit_time                | datetime             | NO   |     | NULL    |                |
| user_id                   | varchar(200)         | YES  |     | NULL    |                |
| config_cookie             | tinyint(1)           | NO   |     | NULL    |                |
| custom_var_k1             | varchar(200)         | YES  |     | NULL    |                |
| custom_var_v1             | varchar(200)         | YES  |     | NULL    |                |
+---------------------------+----------------------+------+-----+---------+----------------+

Indexes:

+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                | Non_unique | Key_name                     | Seq_in_index | Column_name            | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_table         |          0 | PRIMARY                      |            1 | idvisit                | A         |        1502 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_datetime        |            1 | idsite                 | A         |           5 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_datetime        |            2 | visit_time             | A         |        1502 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_idvisitor       |            1 | idsite                 | A         |           1 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_idvisitor       |            2 | idvisitor              | A         |         500 | NULL     | NULL   |      | BTREE      |         |               |
+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

and I've prepared two queries:

SELECT 
    COUNT(`idvisit`) AS `visits_count`, 
    DATE(`visit_time`) AS `date` 
FROM (
    SELECT * 
    FROM 
        `visits_table` 
    WHERE 
        `idsite` = 2 
        AND `visit_time` >= '2015-04-01 00:00:00' 
        AND `visit_time` <= '2015-04-30 23:59:59'
) AS `visits`
WHERE 1 
GROUP BY 
    DATE(`visit_time`); 



+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+
| id | select_type | table                | type | possible_keys                                | key  | key_len | ref  | rows | Extra                           |
+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>           | ALL  | NULL                                         | NULL | NULL    | NULL | 1469 | Using temporary; Using filesort |
|  2 | DERIVED     | visits_table         | ALL  | index_idsite_datetime,index_idsite_idvisitor | NULL | NULL    | NULL | 1502 | Using where                     |
+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+

in MySQL 5.6 in row 2 type = ref, key = index_idsite_datetime, key_len = 4, ref = const, Extra = Using index

    SELECT 
        COUNT(`idvisit`) AS `visits_count`, 
        DATE(`visit_time`) AS `date` 
    FROM 
        `visits_table` 
    WHERE 
        `idsite` = 2 
        AND `visit_time` >= '2015-04-01 00:00:00' 
        AND `visit_time` <= '2015-04-30 23:59:59'
    GROUP BY 
        DATE(`visit_time`);

+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table                | type  | possible_keys                                | key                   | key_len | ref  | rows | Extra                                                     |
+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | visits_table         | range | index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 12      | NULL | 1468 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+

I have table with 86M rows and both queries take about 2 hours to execute. Is there anything I can do to speed up these queries?

Sebastian Piskorski
  • 4,026
  • 3
  • 23
  • 29
  • I think the problem is the GROUP BY DATE(...) which force MYSQL to sort the result (filesort). If you can separate the Visit_time to Visit_Date and Visit_Time, your query can be much faster. How many RAM do you have and how many result matching the criteria (before group by)? – Tim3880 May 28 '15 at 23:00
  • Try running [`ANALYZE TABLE VISITS_TABLE`](https://dev.mysql.com/doc/refman/5.0/en/analyze-table.html), then retry the 2nd query. Also, change the syntax to `AND visit_time BETWEEN '2015-04-01 00:00:00' AND '2015-04-30 23:59:59'` – Bohemian May 28 '15 at 23:24

1 Answers1

1

I would suggest writing the query as:

SELECT COUNT(*) AS `visits_count`, 
       DATE(`visit_time`) AS `date` 
FROM  `visits_table` 
WHERE  `idsite` = 2 AND
       `visit_time` >= '2015-04-01' AND
       `visit_time` < '2015-05-01'
GROUP BY DATE(`visit_time`);

This might save an iota of time, because the index is now a covering index.

One way I can think to improve the query is to get rid of the group by. Try a query like this:

select dte,
       (select count(*)
        from visits_table
        where idsite = 2 and
              visit_time >= dates.dte AND visit_time < dates.dte + interval 1 day
from (select date('2015-04-01') as dte union all
      select date('2015-04-02') as dte
     ) dates;

MySQL is much better about using indexes for correlated subqueries than it is about using indexes for aggregation. The downside to this approach is that time will increase linearly with the number of days in the result set.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Do you mean to union all days like day by day? Like this select?(select date('2015-04-01') as dte union all select date('2015-04-02') as dte (...) union all select date('2015-04-30')). So I will have union of all days? It will be 29 unions wouln't that be slow? – Sebastian Piskorski May 29 '15 at 05:52
  • @SebastianPiskorski . . . Yes. I would just starting with two days. Doing a union of 30 constant records basically takes no time in the context of your query. The issue is the correlated subquery, which you can test on a smaller number of days. – Gordon Linoff May 29 '15 at 14:13