Let's make sure that we are optimizing the right query. I suggest we check this condition in the ON
clause:
mc.category_id = mc.category_id
We know that's going to be TRUE for every row in mc
with a non-NULL value of category_id
. We could express that condition as:
mc.category_id IS NOT NULL
This means the join is almost a cross join; every row returned from m
matched with every row from mc
. That is, we could get an equivalent result writing:
FROM phpfox_channel_video m
JOIN phpfox_channel_category mc
ON mc.category_id IS NOT NULL
I suspect that's not actually the result we're after. I think we were meaning to match to m.category_id
. But that's just a guess.
If video_id
column is PRIMARY KEY or UNIQUE KEY on m
, we can avoid the potentially expensive GROUP BY
operation by avoiding the joins that create duplicated rows, by using EXISTS with correlated subqueries. If we can avoid generating an intermediate result with duplicate values of video_id
, then we can avoid the need to do the GROUP BY
.
Also, for the inline view query, rather than return all columns *
we can return just the expressions that we need. In the outer query, the only column referenced is user_id
.
So we could write something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
WHERE mc.category_id = m.category_id
-- mc.category_id = mc.category_id -- <original
)
AND EXISTS ( SELECT 1
FROM phpfox_channel_category_data mcd
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id
For tuning, optimal index for m
will have leading columns that have equality predicates, followed by the time_stamp
column so that we can avoid a "Using filesort" operation, the ORDER BY
can be satisfied by returning rows in index order. It looks like the reason we need the rows ordered is for the LIMIT clause.
... ON phpfox_channel_video (in_process, view_id, item_id, module_id
, time_stamp, video_id, ... )
The other two tables, we want indexes with leading columns that have equality predicates
... ON phpfox_channel_category_data (video_id, category_id, ...)
... ON phpfox_channel_category ( category_id, ... )
NOTES:
(It's not entirely clear why we need an inline view, and we are delaying the join from the user_id
reference. Then again, the point of the entire query isn't really obvious to me; I'm just providing a re-write, given the provided SQL, with the change to the condition category_id
.)
The above assumed that category_id
column exists in m
, and that it's a one-to-many relationship.
But if that's not true... if the mcd
table is actually junction table, resolving a many-to-many relationship between video and category, such that the join condition was meant to be
mcd.category_id = mc.category_id
^
Then we would want to replace the WHERE EXISTS
and AND EXISTS
in the query above, into a single correlated subquery. Something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
JOIN phpfox_channel_category_data mcd
ON mcd.category_id = mc.category_id
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id