SELECT DISTINCT Extremely slow

Question

I have a query that is taking 48 seconds to execute as follows:

SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days 
FROM tmd_logins
join tmd_users on tmd_logins.userID = tmd_users.userID 
where tmd_users.isPatient = 1 AND loggedIn > '2011-03-25' 
and tmd_logins.userID in 
    (SELECT userID as accounts30Days FROM tmd_users
    where isPatient = 1 AND created > '2012-04-29' AND computerID is null)

When I remove the DISTINCT keyword it takes less than 1 second, so it seems the bottle neck lies within that.

The database adds an entry to the tmd_logins table every time a user logs into the system. I am trying to get a total count of all users that are patients that have been created and logged in within a given period of time, such as in the last 30 days.

I have tried removing the DISTINCT keyword and adding group by tmd_logins.userID to the statement but the performance issue remains.

Table tmd_logins has about 300,000 records, tmd_users has about 40,000

Is there a better way of doing this?

A silly question: did you try using `DISTINCT tmd_users.userID` instead? The result should be the same, but you might get better luck with the optimizer, because `tmd_users.userID` is indexed (assuming its the primary key of `tmd_users`, of course). — Sergey Kalinichenko, May 29 '12 at 23:33
Gordon's second solution worked excellently. I played with it a little more and found that if I took out the first `where tmd_users.isPatient` clause it fixed it as well. Thanks! — ToddBFisher, May 30 '12 at 14:00

score 4 · Accepted Answer · answered May 29 '12 at 23:46

The problem that you have is the execution plan. My guess is that the "in" clause might be confusing it. You might try:

SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days 
FROM tmd_logins join
     tmd_users
     on tmd_logins.userID = tmd_users.userID join
     (SELECT distinct userID as accounts30Days
      FROM tmd_users
      where isPatient = 1 AND
            created > '2012-04-29' AND
            computerID is null
     ) t
     on tmd_logins.userID = t.accounts30Days
where tmd_users.isPatient = 1 AND
      loggedIn > '2011-03-25'

That might or might not work. However, I'm wondering about the structure of the query itself. It would seem that UserID should be distinct in a table called tmd_users. If so, then you can wrap all your conditions into one:

SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days 
FROM tmd_logins join
     tmd_users
     on tmd_logins.userID = tmd_users.userID 
where tmd_users.isPatient = 1 AND
      loggedIn > '2011-03-25' and
      created > '2012-04-29' AND
      computerID is null

If my guess is true, then this should definitely run faster.

SELECT DISTINCT Extremely slow

1 Answers1