1

I am using Mysql 5.7 I have table which having 7006500 rows. My query performing group by and fetching row which has maximum count with each group on column which is already indexed but still takes time for execution. Below is my query,execution plan and table schema.

Table Schema

        CREATE TABLE templog (
          id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
          userid bigint(12) unsigned NOT NULL,
          type tinyint(3) NOT NULL DEFAULT '0',
          os tinyint(4) NOT NULL DEFAULT '0',
          day date DEFAULT NULL,
          activetime smallint(5) unsigned NOT NULL DEFAULT '0',
          createdat datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
          timegroupid tinyint(4) NOT NULL DEFAULT '0',
          PRIMARY KEY (`id`),
          KEY templog_type_IDX (`type`,`day`,`userid`,`timegroupid`) USING BTREE
        ) ENGINE=InnoDB AUTO_INCREMENT=7006500 DEFAULT CHARSET=utf8;

My Query:-

SELECT  SQL_NO_CACHE y.userid, y.timegroupid as besttime,y.cnt
    FROM (
        SELECT  @row_number := CASE WHEN @userid=x.userid THEN @row_number+1 ELSE 1 END AS row_number ,
                @userid := x.userid AS userid ,x.cnt,x.timegroupid
            FROM (
                SELECT  userid, timegroupid ,COUNT(userid) as cnt
                    from  templog
                    where  type = 3
                      AND  day BETWEEN '2020-01-01' AND '2020-01-20'
                      AND  userid < 771267
                    GROUP by  userid, timegroupid
                    ORDER by  userid DESC ,cnt DESC 
                  ) x,
            ( SELECT  @row_number:=0, @userid:='') AS t 
          ) y
    where  y.row_number = 1
    ORDER by  y.userid DESC
    LIMIT  1000;

Query Explain format:

                {
              "query_block": {
                "select_id": 1,
                "cost_info": {
                  "query_cost": "12.00"
                },
                "ordering_operation": {
                  "using_filesort": true,
                  "table": {
                    "table_name": "y",
                    "access_type": "ref",
                    "possible_keys": [
                      "<auto_key0>"
                    ],
                    "key": "<auto_key0>",
                    "used_key_parts": [
                      "row_number"
                    ],
                    "key_length": "9",
                    "ref": [
                      "const"
                    ],
                    "rows_examined_per_scan": 10,
                    "rows_produced_per_join": 10,
                    "filtered": "100.00",
                    "cost_info": {
                      "read_cost": "10.00",
                      "eval_cost": "2.00",
                      "prefix_cost": "12.00",
                      "data_read_per_join": "320"
                    },
                    "used_columns": [
                      "row_number",
                      "userid",
                      "cnt",
                      "timegroupid"
                    ],
                    "attached_condition": "((`y`.`row_number` <=> 1))",
                    "materialized_from_subquery": {
                      "using_temporary_table": true,
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 2,
                        "cost_info": {
                          "query_cost": "6441.25"
                        },
                        "nested_loop": [
                          {
                            "table": {
                              "table_name": "t",
                              "access_type": "system",
                              "rows_examined_per_scan": 1,
                              "rows_produced_per_join": 1,
                              "filtered": "100.00",
                              "cost_info": {
                                "read_cost": "0.00",
                                "eval_cost": "0.20",
                                "prefix_cost": "0.00",
                                "data_read_per_join": "16"
                              },
                              "used_columns": [
                                "@row_number:=0",
                                "@userid:=''"
                              ],
                              "materialized_from_subquery": {
                                "using_temporary_table": true,
                                "dependent": false,
                                "cacheable": true,
                                "query_block": {
                                  "select_id": 4,
                                  "message": "No tables used"
                                }
                              }
                            }
                          },
                          {
                            "table": {
                              "table_name": "x",
                              "access_type": "ALL",
                              "rows_examined_per_scan": 25725,
                              "rows_produced_per_join": 25725,
                              "filtered": "100.00",
                              "cost_info": {
                                "read_cost": "1296.25",
                                "eval_cost": "5145.00",
                                "prefix_cost": "6441.25",
                                "data_read_per_join": "602K"
                              },
                              "used_columns": [
                                "userid",
                                "timegroupid",
                                "cnt"
                              ],
                              "materialized_from_subquery": {
                                "using_temporary_table": true,
                                "dependent": false,
                                "cacheable": true,
                                "query_block": {
                                  "select_id": 3,
                                  "cost_info": {
                                    "query_cost": "140807.11"
                                  },
                                  "ordering_operation": {
                                    "using_filesort": true,
                                    "grouping_operation": {
                                      "using_temporary_table": true,
                                      "using_filesort": false,
                                      "table": {
                                        "table_name": "templog",
                                        "access_type": "range",
                                        "possible_keys": [
                                          "templog_type_IDX"
                                        ],
                                        "key": "templog_type_IDX",
                                        "used_key_parts": [
                                          "type",
                                          "day"
                                        ],
                                        "key_length": "13",
                                        "rows_examined_per_scan": 694718,
                                        "rows_pr
            oduced_per_join": 25725,
                                        "filtered": "33.33",
                                        "using_index": true,
                                        "cost_info": {
                                          "read_cost": "1863.51",
                                          "eval_cost": "5145.03",
                                          "prefix_cost": "140807.11",
                                          "data_read_per_join": "803K"
                                        },
                                        "used_columns": [
                                          "id",
                                          "userid",
                                          "type",
                                          "day",
                                          "timegroupid"
                                        ],
                                        "attached_condition": "((`templog`.`type` = 3) and (`templog`.`day` between '2020-01-01' and '2020-01-20') and (`templog`.`userid` < 771267))"
                                      }
                                    }
                                  }
                                }
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              }
            }

Is there any other to optimize query or change index order or rewrite query in another way for better performance?

Rick James
  • 135,179
  • 13
  • 127
  • 222
5a01d01P
  • 663
  • 2
  • 9
  • 20

1 Answers1

0
  1. Do not count on @variables working like you would expect them to. I think the next version is beginning to disallow them.

  2. The optimizer is free to throw away the ORDER BY in the derived table. This will lead to wrong results. Tacking on a large LIMIT to the subquery may prevent that.

  3. Build and maintain a "summary table". This can significantly speed up this and similar queries.

    CREATE TABLE Summary (
        userid ...,
        timegroupid ...,
        type ...,
        day ...,
        cnt SMALLINT UNSIGNED NOT NULL,  -- COUNT(*)
        tottime INT UNSIGNED NOT NULL,   -- SUM(activetime)
        PRIMARY KEY(timegroupid, userid, type, day)
    

However, without understanding the data better, I cannot predict whether this table will be noticeably smaller than the original. If it is significantly smaller, this summary table will not be practical.

  1. I added another tag -- follow it for more discussion of groupwise-max.
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks for suggesting summary table. I already have summary table. My table `templog` is detail log table, after performing query when i fetch results that records i have to update with my summary table.So first i need to optimize above query. – 5a01d01P Apr 21 '19 at 06:10
  • @5a01d01P - by "maintain", I mean `SELECT` only the rows that were inserted into `templog` yesterday (or during the last hour, or since you last updated the summary table). Your query does not sound like augmenting the summary table since it is limited ot an odd time range, not all users, and only one type. More details, please. – Rick James Apr 21 '19 at 18:35