MySQL query SLOW Don't know how to optimize

Question

MySQL query SLOW don’t know how to optimize

I think I m fine with hardware 60GB RAM 10 Cores SSD

Hi I m having a big issue with this query running slow on Mysql they query is below:

# Thread_id: 1165100  Schema: back-Alvo-11-07-19  QC_hit: No
# Query_time: 9.015205  Lock_time: 0.000188  Rows_sent: 1  Rows_examined: 2616880
# Rows_affected: 0
SET timestamp=1568549358;
SELECT count(*) as total_rows FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category AS mc
    ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd
    ON(mcd.video_id = m.video_id)

WHERE m.in_process = 0 AND m.view_id = 0 
     AND m.module_id = 'videochannel' 
    AND m.item_id = 0 AND m.privacy IN(0) 
    AND mcd.category_id = 17
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
LIMIT 12

)) AS m
JOIN phpfox_user AS u
    ON(u.user_id = m.user_id);

This query is running very slow as you can see 9 seconds

When looking for online help to optimize queries always talk about adding indexes,

as you can see below for EXPLAIN statment I already have indexes

Do you guys have any Idea where I should look to improve speed os this query? I m not DB guy having hard time with this. This is a website and have 400,000 videos.

Thanks

*"MySQL query SLOW Don't know how to optimize"* We also don't have noooo idea without `SHOW CREATE TABLE ` foreach table and a `EXPLAIN ` outputs.. — Raymond Nijland, Nov 13 '19 at 18:09
your picture is barely readable, try to add a text version instead or a better image — nbk, Nov 13 '19 at 18:13
Also `SELECT m.* ... GROUP BY m.video_id ORDER BY m.time_stamp DESC` might be used wrong, check the manual how MySQL handles detection off [functional dependency](https://dev.mysql.com/doc/refman/5.7/en/group-by-functional-dependence.html) when using [GROUP BY](https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html) and check if you are using it correctly .. With correctly i mean atleast running MySQL 5.7.5+ and a running `ONLY_FULL_GROUP_BY ` sql_mode otherwise the query can not generated trustworthy results.. — Raymond Nijland, Nov 13 '19 at 18:13

score 1 · Answer 1 · edited Nov 13 '19 at 21:48

1

The explain shows that you are not using an index on table phpfox_channel_video as m, and that it is using a temporary index on table phpfox_channel_category AS mc, which means it is not using an index, but is building an index first, which takes considerable time.

Also, the index for table phpfox_channel_category_data AS mcd could be better.

The indexes you need are:

CREATE INDEX idx_cat_data_video_id ON phpfox_channel_category_data
  (category_id, video_id);
CREATE INDEX idx_channel_cat_id ON phpfox_channel_category (category_id);
CREATE INDEX idx_video_mult ON phpfox_channel_video
  (in_process, view_id, module_id, item_id, privacy, video_id, time_stamp);

edited Nov 13 '19 at 21:48

spencer7593

106,611
15
112
140

answered Nov 13 '19 at 18:22

Marlin Pierce

9,931
4
30
52

Hi Marlin could you check this one? – FerPaqui Nov 13 '19 at 19:37
The same indexes should work for that query too. There is nothing you can do to index the `LIKE` expressions. – Marlin Pierce Nov 13 '19 at 19:55
Thanks Marlin I already applied your answer, did not see much improvement for performance do not know where to look at anymore, if you have any guidance where should I look to fix this would help. – FerPaqui Nov 13 '19 at 20:17
Then I wonder if it is the network connection between your client and the server. – Marlin Pierce Nov 14 '19 at 16:40
i think this answer misses the mark. before we start throwing around suggestions for indexes, lets review the SQL statement for signs that what we are trying to tune might *not* be returning the result we expect, and for signs that it may be forcing inefficient operations. – spencer7593 Nov 14 '19 at 16:58
@spencer7593 I somewhat agree. Some of the other suggestions on re-writing the SQL have some merit and should be considered, but I'm not sure they will improve performance, and may need the same indexes I'm suggesting. A sub-query for exist vs the group by is the best suggestion I've seen here, and sounds like it could save time, although if there is an index to do the group by it might not be much. – Marlin Pierce Nov 14 '19 at 17:52
if `mcd` is a junction table, then likely there are foreign key constraints, so we likely already have two indexes, one with leading column `category_id` and the other with leading column `video_id`. It's also likely that these columns are non-NULL, and the combination of the two is unique. If so, I'd opt for `UNIQUE` indexes, `ON mcd (category_id, video_id)` *and* `ON mcd (video_id, category_id)` ... not specifically for improving performance of this one query, but to give a variety of statements a chance to make effective use of an index ... starting either from video or from category – spencer7593 Nov 14 '19 at 19:38

score 1 · Answer 2 · answered Nov 14 '19 at 01:42

Don't fetch m.* if you are only going to do COUNT(*).
If phpfox_channel_category is a many-to-many mapping table, follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
m needs INDEX(in_process, view_id, module_id, item_id, privacy) in any order.

Avoid the GROUP BY:

         INNER JOIN  phpfox_channel_category AS mc ON(mc.category_id = mc.category_id)
         INNER JOIN  phpfox_channel_category_data AS mcd ON(mcd.video_id = m.video_id)
         AND  mcd.category_id = 17
         GROUP BY  m.video_id

--> (something like)

    AND EXISTS(
                 SELECT 1
                     FROM phpfox_channel_category      AS mc
                     JOIN phpfox_channel_category_data AS mcd
                             ON mcd.video_id = mc.video_id
                     WHERE mcd.video_id = 17
                       AND mc.video_id = m.video_id
              )

yes. avoid generating an intermediate set and then a GROUP BY to collapse the rows, if we know `video_id` is the PRIMARY KEY of `m`. I think another big performance problem is the condition mc.category_id` equals itself... looks like `mcd` is meant to function as a junction table, resolving the many-to-many between `video` and `category`. But we're just guessing. I totally agree; before we start throwing out suggestions for indexes, lets make sure we are running the statement we want to run. — spencer7593, Nov 14 '19 at 16:48

spencer7593 · Answer 3 · 2019-11-14T16:40:59.260

Let's make sure that we are optimizing the right query. I suggest we check this condition in the ON clause:

 mc.category_id = mc.category_id

We know that's going to be TRUE for every row in mc with a non-NULL value of category_id. We could express that condition as:

 mc.category_id IS NOT NULL

This means the join is almost a cross join; every row returned from m matched with every row from mc. That is, we could get an equivalent result writing:

   FROM phpfox_channel_video    m
   JOIN phpfox_channel_category    mc
     ON mc.category_id IS NOT NULL

I suspect that's not actually the result we're after. I think we were meaning to match to m.category_id. But that's just a guess.

If video_id column is PRIMARY KEY or UNIQUE KEY on m, we can avoid the potentially expensive GROUP BY operation by avoiding the joins that create duplicated rows, by using EXISTS with correlated subqueries. If we can avoid generating an intermediate result with duplicate values of video_id, then we can avoid the need to do the GROUP BY.

Also, for the inline view query, rather than return all columns * we can return just the expressions that we need. In the outer query, the only column referenced is user_id.

So we could write something like this:

SELECT COUNT(*) AS total_rows
  FROM (
         SELECT m.user_id
           FROM phpfox_channel_video m
          WHERE EXISTS ( SELECT 1
                           FROM phpfox_channel_category mc
                          WHERE mc.category_id  = m.category_id
                      --        mc.category_id = mc.category_id  -- <original
                       )
            AND EXISTS ( SELECT 1
                           FROM phpfox_channel_category_data mcd
                          WHERE mcd.video_id     = m.video_id
                            AND mcd.category_id  = 17
                       )
            AND m.in_process  = 0
            AND m.view_id     = 0
            AND m.module_id   = 'videochannel'
            AND m.item_id     = 0
            AND m.privacy   IN (0)
          ORDER BY m.time_stamp DESC
          LIMIT 12
       ) d
    JOIN phpfox_user u
      ON u.user_id = d.user_id

For tuning, optimal index for m will have leading columns that have equality predicates, followed by the time_stamp column so that we can avoid a "Using filesort" operation, the ORDER BY can be satisfied by returning rows in index order. It looks like the reason we need the rows ordered is for the LIMIT clause.

... ON phpfox_channel_video (in_process, view_id, item_id, module_id
          , time_stamp, video_id, ... )

The other two tables, we want indexes with leading columns that have equality predicates

... ON phpfox_channel_category_data (video_id, category_id, ...)

... ON phpfox_channel_category ( category_id, ... )

NOTES:

(It's not entirely clear why we need an inline view, and we are delaying the join from the user_id reference. Then again, the point of the entire query isn't really obvious to me; I'm just providing a re-write, given the provided SQL, with the change to the condition category_id.)

The above assumed that category_id column exists in m, and that it's a one-to-many relationship.

But if that's not true... if the mcd table is actually junction table, resolving a many-to-many relationship between video and category, such that the join condition was meant to be

 mcd.category_id = mc.category_id 
   ^

Then we would want to replace the WHERE EXISTS and AND EXISTS in the query above, into a single correlated subquery. Something like this:

SELECT COUNT(*) AS total_rows
  FROM (
         SELECT m.user_id
           FROM phpfox_channel_video m
          WHERE EXISTS ( SELECT 1
                           FROM phpfox_channel_category mc
                           JOIN phpfox_channel_category_data mcd
                             ON mcd.category_id  = mc.category_id 
                          WHERE mcd.video_id     = m.video_id
                            AND mcd.category_id  = 17
                       )
            AND m.in_process  = 0
            AND m.view_id     = 0
            AND m.module_id   = 'videochannel'
            AND m.item_id     = 0
            AND m.privacy   IN (0)
          ORDER BY m.time_stamp DESC
          LIMIT 12
       ) d
    JOIN phpfox_user u
      ON u.user_id = d.user_id

Thanks for the help I appreciated it. I tried to run yout suggested query and got the error MySQL said: #1054 - Unknown column 'mcd.category' in 'on clause' — FerPaqui, Nov 14 '19 at 01:35
looks like column reference might be `mcd.category_id` . i did not mean to provide a SQL solution for you; my intent was to get you to think a little more clearly about what result we actually want to return. And in terms of a specification, I have nothing to go on in except the SQL that was provided, and the matching `mc.category_id = mc.category_id`. As far as which other table actually contains a column that should be matched to `mc.category_id`, I'm just guessing. The last query is based on `mcd` being a junction table, a many-to-many relationship between `m` and `mc`. Just guessing. — spencer7593, Nov 14 '19 at 16:36
To echo the beginning of the answer... lets be sure that we are tuning the statement, be sure that the statement is returning the result we actually want to return, and be sure that we are using patterns that give us the best shot at good performance (e.g. avoid creating a huge intermediate resultset and forcing a Using filesort operation to collapse rows.) Yes, adding indexes can improve performance, but only so much; often times the biggest performance bang comes from re-writing the query. — spencer7593, Nov 14 '19 at 16:38
If `mcd` is a junction table, then likely we already have foreign keys defined, and for InnoDB, we already have indexes to support the constraints. Before we make recommendations about adding indexes, we really need to know what indexes already exist. For example, if the `(video_id,category_id)` tuple is unique, then we probably want UNIQUE index on both `(video_id,category_id)` and on `(video_id,category_id)` and we would want to drop any singleton indexes on just `video_id` and just `category_id`. Creating suitable indexes is an important part of performance tuning, but its just one part. — spencer7593, Nov 14 '19 at 19:43
Thanks for your time spencer. I thought this performance issue could be something more basic that I would be missing, but as I am not DB guy to apply all your reasoning and thinking is not possible for me. I would have to be advance level on Database related stuff or at least have a better knowledge than I have. At the end of all this. What I have done is fine tune my.CNF file and looked at indexes and it did not help very much. It seems more about structure and development itself. Thank you very much. — FerPaqui, Nov 14 '19 at 20:32

MySQL query SLOW Don't know how to optimize

3 Answers3