Slow execution of a subquery when no matches

Question

Please note that I have asked this question on dba.stackexchange.com, but I thought I'd post it here too:

In MySQL, I have two basic tables - Posts and Followers:

CREATE TABLE Posts (
  id int(11) NOT NULL AUTO_INCREMENT,
  posted int(11) NOT NULL,
  body varchar(512) NOT NULL,
  authorId int(11) NOT NULL,
  PRIMARY KEY (id),
  KEY posted (posted),
  KEY authorId (authorId,posted)
) ENGINE=InnoDB;

CREATE TABLE Followers (
  userId int(11) NOT NULL,
  followerId int(11) NOT NULL,
  PRIMARY KEY (userId,followerId),
  KEY followerId (followerId)
) ENGINE=InnoDB;

I have the following query, which seems to be optimized enough:

  SELECT p.*
    FROM Posts p
   WHERE p.authorId IN (SELECT f.userId
                          FROM Followers f
                         WHERE f.followerId = 9
                      ORDER BY authorId)
ORDER BY posted
   LIMIT 0, 20

EXPLAIN output:

+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+
| id   | select_type        | table | type            | possible_keys      | key     | key_len | ref        | rows | Extra                    |
+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+
|    1 | PRIMARY            | p     | index           | NULL               | posted  | 4       | NULL       |   20 | Using where              |
|    2 | DEPENDENT SUBQUERY | f     | unique_subquery | PRIMARY,followerId | PRIMARY | 8       | func,const |    1 | Using index; Using where |
+------+--------------------+-------+-----------------+--------------------+---------+---------+------------+------+--------------------------+

When followerId is a valid id (meaning, it actually exists in both tables), the query execution is almost immediate. However, when the id is not present in the tables, the query only returns results (empty set) after a 7 second delay.

Why is this happening? Is there some way to speed up this query for cases where there are no matches (without having to do a check ahead of time)?

score 0 · Answer 1 · edited May 23 '17 at 12:17

Is there some way to speed up this query ...???

Yes. You should do two things.

First, you should use EXISTS instead of IN (cross reference SQL Server IN vs. EXISTS Performance). It'll speed up the instances where there is a match, which will come in handy as your data set grows (it's may be fast enough now, but that doesn't mean you shouldn't follow best practices, and in this case EXISTS is a better practice than IN)

Second, you should modify the keys on your second table just a little bit. You were off to a good start using the compound key on (userId,followerId), but in terms of optimizing this particular query, you need to keep in mind the "leftmost prefix" rule of MySQL indices, eg

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html

What your Query Execution Plan from EXPLAIN is telling you is that SQL thinks it makes more sense to join Followers to Posts (using the Primary Key on Posts) and filter the results for a given followerId off of that index. Think of it like saying "Show me all the possible matches, then reduce that down to just the ones that match followerId = {}"

If you replace your followerId key with a compound key (followerId,userId), you should be able to quickly zoom in to just the user ids associated with a given followerID and do the existence check against those.

I wish I knew how to explain this better... it's kind of a tough concept to grasp until you have a "Aha!" moment and it clicks. But if you look into the leftmost prefix rules on indices, and also change the key on followerId to be a key on (followerId,userId), I think it'll speed it up quite a bit. And if you use EXISTS instead of IN, that'll help you maintain that speed even as your data set grows.

Can you please clarify by providing a concrete example? An `EXISTS` query can't be used as a straight replacement for `IN`; furthermore, while I understand what you're saying, it would help if you could provide an actual query (along with any changes needed to make to tables). — Melllvar, Nov 13 '14 at 22:03
`SELECT p.* FROM Posts p WHERE EXISTS (SELECT * FROM Followers WHERE p.authorid =Followers.userId AND followers.followerId = 9) ORDER BY posted LIMIT 0,20` — Evan Volgas, Nov 13 '14 at 22:14
In terms of the actual change to the table: `ALTER TABLE Followers DROP KEY followerId;` `ALTER TABLE Followers ADD KEY follower_user (followerId,userId);` — Evan Volgas, Nov 13 '14 at 22:16

score 0 · Answer 2 · answered Nov 17 '14 at 07:47

0

try this one :

SELECT p.*
FROM Posts p
inner join Followers f On f.userId = p.authorId
WHERE f.followerId = 9
ORDER BY posted
   LIMIT 0, 20

answered Nov 17 '14 at 07:47

Asromi rOmi

197
1
7

Please add some explanation to make this answer valuable for other readers as well (e.g. pointing out what your solutions does differently compared to the OP's solution). – Johannes S. Nov 17 '14 at 08:10
i think, performance for executing query depend on some reason. the amount of data in the table, whether or not the db connection, efficient query whether etc. in this case we are discussing is the query efficiency. querypun efficiency varies between SQL Server, Oracle, or MySQL etc. i like 'evanv' query, it will runing well in Oracle but, in mySQL i never try. i just give example query, I normally use. usually my query will running well in SQL Server and mySQL. Just try. – Asromi rOmi Nov 18 '14 at 00:51

Slow execution of a subquery when no matches

2 Answers2