0
select COUNT(p.id) AS `num`, cat.id, cat.name, cat.parent_id AS `parent_id` 
from products p 
INNER JOIN `products_categories` AS `pc` ON p.id=pc.products_id 
INNER JOIN `categories` AS `cat` ON pc.categories_id=cat.id 
WHERE p.status = 1 AND p.gender IN ('female','neutral') 
group by cat.id

Explain query:

1   SIMPLE  p   ref PRIMARY,gender,status   status  1   const   139107  Using where; Using temporary; Using filesort
1   SIMPLE  pc  ref products_id,categories  products_id 4   mydb.p.id   1   Using index
1   SIMPLE  cat eq_ref  PRIMARY,categoryname    PRIMARY 4   mydb.pc.categories_id   1   Using where

Relevant indexes:

products    0   PRIMARY 1   id  A   299339              BTREE       
products    1   title   1   title   A   299339              BTREE       
products    1   sku 1   sku A   299339              BTREE       
products    1   body    1   body    A   299339  200         BTREE       
products    1   short_description   1   short_description   A   299339  200     YES BTREE       
products    1   keywords    1   keywords    A   2   200         BTREE       
products    1   gender  1   gender  A   10              BTREE       
products    1   status  1   status  A   2               BTREE       
products    1   brand_id    1   brand_id    A   3741            YES BTREE       
products    1   merchant    1   merchant_id A   52              BTREE       
products    1   title_2 1   title,body,keywords     299339              FULLTEXT        
products    1   title_3 1   title       299339              FULLTEXT        
products    1   body_2  1   body        299339              FULLTEXT        

products_categories 0   PRIMARY 1   id  A   514054              BTREE       
products_categories 1   products_id 1   products_id, categories_id  A   514054              BTREE           
products_categories 1   categories  1   categories_id   A   266             BTREE       

categories  0   PRIMARY 1   id  A   154             BTREE       
categories  1   categoryname    1   name    A   154             BTREE       

This is a DB with products, categories and an N:N relationship between them. Products can be in 1 or more category.

I basically need a query that tells me, for the current product filters I have (in this case the status and gender) if the category has any product or not (so I can hide categories without products). Currently I count the products inside each category to know this.

The query WHERE parameters will change according to the user selected filters so that part is not very important in this optimization.

I don't need the exact number of products a category, just if they have products or not. Products table has quite a few indices, has the products_categories and categories table. Products table has about 400k products, 150 categories and 500k products_categories.

MySQL 5.6.22 hosted on AWS RDS, all tables in InnoDB.

I understand my explain query shows why this is slow (going through a lot of products) but I don't have any ideas how to optimize this... maybe a different way to think about this?

AlfaTeK
  • 7,487
  • 14
  • 49
  • 90

2 Answers2

0

For this query:

select COUNT(p.id) AS `num`, cat.id, cat.name, cat.parent_id AS `parent_id` 
from products p INNER JOIN
     products_categories `pc`
     ON p.id = pc.products_id INNER JOIN
     categories cat
      ON pc.categories_id = cat.id 
WHERE p.status = 1 AND p.gender IN ('female', 'neutral') 
group by cat.id;

You need indexes on all the join keys. I would recommend products(status, gender, id), products_categories(products_id, categories_id) and categories(id).

Sometimes, in MySQL, it can be faster to use a correlated subquery than a group by:

select c.*,
       (select count(*)
        from products_categories `pc` INNER JOIN
             products p
             ON p.id = pc.products_id
        where pc.categories_id = cat.id AND
              p.status = 1 AND p.gender IN ('female', 'neutral') 
       ) as cnt
from categories c;

This version wants indexes on products_categories(categories_id, products_id) and products(id, status, gender).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I have followed your answers lot of times. I have seen you have suggested indexes in the answers for better performance. If you create so many indexes for particular table that will affect performance during insert, update and delete operations. How to create effective indexes? – RGS Feb 22 '16 at 04:41
  • @RGS . . . Yes, indexes affect the performance of data modification operations. Whether or not this is important depends on the relative importance of data modifications to queries. – Gordon Linoff Feb 23 '16 at 04:03
  • When I put trace file of slow running query in SQL Query Tuning Advisor, it suggests too many indexes. Whether we can blindly create those missing indexes or not? – RGS Feb 23 '16 at 04:25
0

Your query returns 139107 matching records because you use filter criteria that not very limiting (status=1, gender =female or neutral).Try this

SELECT cat.id, cat.name, cat.parent_id AS `parent_id`,
      COUNT(p.id) AS `num` 
FROM `categories` AS `cat`
INNER JOIN `products_categories` AS `pc` ON pc.categories_id=cat.id 
INNER JOIN products AS p ON p.id=pc.products_id 
WHERE p.status = 1 AND p.gender IN ('female','neutral') 
GROUP BY cat.id
HAVING COUNT(p.id)>0

Adding HAVING does not automatically improve query. The problem is your filter criteria returns many matched products. Filtering records by gender or boolean status (true/false) may cause table scan to occured because of many duplicated value, even if status and gender is index, MySQL may still think it's cheaper to run table scan than using index.

HAVING is used for filtering any category without product. Try this

SELECT cat.id, cat.name, cat.parent_id AS `parent_id`,
      COUNT(pc._products_id) AS `num` 
FROM `categories` AS `cat`
INNER JOIN `products_categories` AS `pc` ON pc.categories_id=cat.id 
GROUP BY cat.id
HAVING COUNT(pc.products_id)>0

Above query will not join with products table. It just look at product_categories if they have product associated.

Zamrony P. Juhara
  • 5,222
  • 2
  • 24
  • 40
  • A very quick test seems that adding HAVING doesn't improve performance and actually seems it's a bit slower to run – AlfaTeK Feb 22 '16 at 03:21
  • your second suggestion is very fast but I need to join with products table so I can filter my products with status = 1 and gender which is a requirement of my query :( – AlfaTeK Feb 22 '16 at 03:38
  • COUNT() on InnoDB is already slow. It get worse if you do not filter records as much as possible. My suggestion use output of second query, you said in exact value would not be a problem. or you can add status and gender in product_categories table. This will cause redundant data with products table but you remove the need to join with products table. Less table to join. – Zamrony P. Juhara Feb 22 '16 at 03:49