3

Given is a mySQL table named "orders_products" with the following relevant fields:

  • products_id
  • orders_id

Both fields are indexed.

I am running the following query:

SELECT products_id, count( products_id ) AS counter
FROM orders_products
WHERE orders_id
IN (
  SELECT DISTINCT orders_id
  FROM orders_products
  WHERE products_id = 85094
)
AND products_id != 85094
GROUP BY products_id
ORDER BY counter DESC
LIMIT 4

This query takes extremely long, around 20 seconds. The database is not very busy otherwise, and performs well on other queries.

I am wondering, what causes the query to be so slow?

The table is rather big (around 1,5 million rows, size around 210 mb), could this be a memory issue?

Is there a way to tell exactly what is taking mySQL so long?

Output of Explain:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     orders_products     range   products_id     products_id     4   NULL    1577863     Using where; Using temporary; Using filesort
2   DEPENDENT SUBQUERY  orders_products     ref     orders_id,products_id   products_id     4   const   2   Using where; Using temporary
Bohemian
  • 412,405
  • 93
  • 575
  • 722
Majiy
  • 1,890
  • 2
  • 24
  • 32
  • [EXPLAIN](http://dev.mysql.com/doc/refman/5.0/en/explain.html)? Can you post the explain output? – Rup Jul 11 '13 at 10:12
  • do you have indexes on orders_id and products_id? – bizzehdee Jul 11 '13 at 10:13
  • Provide table structure, with indexes and purpose of the query or expected output. – Minesh Jul 11 '13 at 10:15
  • Both orders_id and products_id have an index. Added output of explain to question. – Majiy Jul 11 '13 at 10:17
  • Ah my old friend "Using filesort". Try removing the `ORDER` and see if that helps, at least as a way of isolating the problem. – Rup Jul 11 '13 at 10:17
  • In certain scenarios, table join is much more efficient than inner query. Try joining the two tables together and avoid inner query, see if you get a performance improvement. –  Jul 11 '13 at 10:18
  • Removing the "order by" removes the "using filesort" from explain. But the query still takes more then 20 seconds. – Majiy Jul 11 '13 at 10:23
  • Oh OK. Sorry, the last MySQL performance nightmare I had (that wasn't MyISAM locking :-/) was the filesort. – Rup Jul 11 '13 at 10:25

2 Answers2

5

Queries that use WHERE ID IN (subquery) perform notoriously badly with mysql.

With most cases of such queries however, it is possible to rewrite them as a JOIN, and this one is no exception:

SELECT
    t2.products_id,
    count(t2.products_id) AS counter
FROM orders_products t1
JOIN orders_products t2
    ON t2.orders_id = t1.orders_id
    AND t2.products_id != 85094 
WHERE t1.products_id = 85094
GROUP BY t2.products_id
ORDER BY counter DESC
LIMIT 4

If you want to return rows where there are no other products (and show a zero count for them), change the join to a LEFT JOIN.

Note how the first instance of the table has the WHERE products_id = X, which allows index look up and immediately reduces the number of rows, and the second instance of the table has the target data, but it looked up on the id field (again fast), but filtered in the join condition to count the other products.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    Works like a charm, query now takes practically no time. – Majiy Jul 11 '13 at 10:26
  • No worries. btw, your edit to the `group by` and `order by` was unnecessary - it will execute correctly either way: The SQL standard provides for references to the column *position* (rather than the column *expression*) for these clauses. – Bohemian Jul 11 '13 at 10:59
  • 1
    I agree that *the performance of `WHERE ID IN (subquery)` is notoriously bad in practically all **MySQL** databases.* – ypercubeᵀᴹ Jul 11 '13 at 11:17
  • @ypercube Which databases does `WHERE IN ()` perform well in IYHO? – Bohemian Jul 11 '13 at 11:33
  • SQL-Server deals just fine with `IN` subqueries, for example. Not sure about Postgres and Oracle but I haven't seen anything to suggest bad performance, and "notoriously bad" sounds really bad description. – ypercubeᵀᴹ Jul 11 '13 at 11:35
  • 1
    @ypercube Duly edited to not tar other databases with the mysql brush – Bohemian Jul 11 '13 at 11:38
1

Give these a try:

  1. MySQL does not optimize IN with a subquery - join the tables together.
  2. Your query contains != condition, which is very difficult to deal with - can you narrow down products and use multiple lookups rather than inequity comparison?