Simplifying, I have four tables.
ref_TagGroup (top-level descriptive containers for various tags)
ref_Tag (tags with name and unique tagIDs)
ref_Product
ref_TagMap (TagID,Container,ContainerType)
A fifth table, ref_ProductFamily exists but is not directly part of this query.
I use the ref_TagMap table to map tags to products, but also to map Tags to TagGroups and also to product families. The ContainerType is set to PROD/TAGGROUP/PRODFAM accordingly.
So, I want to return the tag group, tagname and the number of products AND product families that the tag is mapped to...so results like:
GroupName | TagName | TagHitCnt
My question is, why does the first query come back in milliseconds, the second query comes back in milliseconds but the third query (which is just adding an "OR" condition to include both tag to product and tag to family mappings) takes forever (well, over ten minutes anyway...I haven't let it run all night yet.)
QUERY 1:
SELECT ref_taggroup.groupname,ref_tag.tagname,COUNT(DISTINCT IFNULL(ref_product.familyid,ref_product.id + 100000000),ref_product.name) AS 'taghitcnt'
FROM (ref_taggroup,ref_tag,ref_product)
LEFT JOIN ref_tagmap GROUPMAP ON GROUPMAP.containerid=ref_taggroup.groupid
LEFT JOIN ref_tagmap PRODMAP ON PRODMAP.containerid=ref_product.id
WHERE
GROUPMAP.tagid=ref_tag.tagid AND GROUPMAP.containertype='TAGGROUP'
AND
PRODMAP.tagid=ref_tag.tagid AND PRODMAP.containertype='PROD'
GROUP BY tagname
ORDER BY groupname,tagname ;
QUERY 2:
SELECT ref_taggroup.groupname,ref_tag.tagname,COUNT(DISTINCT IFNULL(ref_product.familyid,ref_product.id + 100000000),ref_product.name) AS 'taghitcnt'
FROM (ref_taggroup,ref_tag,ref_product)
LEFT JOIN ref_tagmap GROUPMAP ON GROUPMAP.containerid=ref_taggroup.groupid
LEFT JOIN ref_tagmap PRODFAMMAP ON PRODFAMMAP.containerid=ref_product.familyid
WHERE
GROUPMAP.tagid=ref_tag.tagid AND GROUPMAP.containertype='TAGGROUP'
AND
PRODFAMMAP.tagid=ref_tag.tagid AND PRODFAMMAP.containertype='PRODFAM'
GROUP BY tagname
ORDER BY groupname,tagname ;
QUERY 3:
SELECT ref_taggroup.groupname,ref_tag.tagname,COUNT(DISTINCT IFNULL(ref_product.familyid,ref_product.id + 100000000),ref_product.name) AS 'taghitcnt'
FROM (ref_taggroup,ref_tag,ref_product)
LEFT JOIN ref_tagmap GROUPMAP ON GROUPMAP.containerid=ref_taggroup.groupid
JOIN ref_tagmap PRODMAP ON PRODMAP.containerid=ref_product.id
JOIN ref_tagmap PRODFAMMAP ON PRODFAMMAP.containerid=ref_product.familyid
WHERE
GROUPMAP.tagid=ref_tag.tagid AND GROUPMAP.containertype='TAGGROUP'
AND
((PRODMAP.tagid=ref_tag.tagid AND PRODMAP.containertype='PROD')
OR
(PRODFAMMAP.tagid=ref_tag.tagid AND PRODFAMMAP.containertype='PRODFAM' ))
GROUP BY tagname
ORDER BY groupname,tagname ;
-- To answer a question that may come up, the COUNT Distinct ifnull in the select is designed to return one record for large numbers of products that are grouped into families and one record for each 'standalone' product that isn't in a family as well. This code works well in other queries.
I've tried doing a UNION on the first two queries, and that works and comes back very quickly, but it's not practical for other reasons that I won't go into here.
What is the best way to do this? What am I doing wrong?
Thanks!
ADDING EXPLAIN OUTPUT
QUERY1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE GROUPMAP ALL 5640 Using where; Using temporary; Using filesort
1 SIMPLE ref_tag ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.tagid 1 Using index
1 SIMPLE ref_taggroup ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.containerid 3 Using index
1 SIMPLE PRODMAP ALL 5640 Using where; Using join buffer
1 SIMPLE ref_product eq_ref PRIMARY PRIMARY 4 lsslave01.PRODMAP.containerid 1
QUERY2
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE GROUPMAP ALL 5640 Using where; Using temporary; Using filesort
1 SIMPLE ref_tag ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.tagid 1 Using index
1 SIMPLE ref_taggroup ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.containerid 3 Using index
1 SIMPLE PRODFAMMAP ALL 5640 Using where; Using join buffer
1 SIMPLE ref_product ref FixtureType FixtureType 5 lsslave01.PRODFAMMAP.containerid 39 Using where
QUERY3
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE GROUPMAP ALL 5640 Using where; Using temporary; Using filesort
1 SIMPLE ref_tag ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.tagid 1 Using index
1 SIMPLE ref_taggroup ref PRIMARY PRIMARY 4 lsslave01.GROUPMAP.containerid 3 Using index
1 SIMPLE PRODMAP ALL 5640 Using join buffer
1 SIMPLE PRODFAMMAP ALL 5640 Using where; Using join buffer
1 SIMPLE ref_product eq_ref PRIMARY,FixtureType PRIMARY 4 lsslave01.PRODMAP.containerid 1 Using where
enter code here
One more update for anyone who is interested: I finally let the third query above run to completion. It took right around 1000 seconds. Dividing this time by the time it takes each of the queries (1 or 2) to run, we get a number around 6000...which is very close to the size of the ref_tagmap table that we're using in our dev environment (much larger in production). So, it looks like we're running one query against each record in that table...but I still can't see why.
Any help would be much appreciated...and I mean seriously, seriously appreciated.