2

I'm running a website that allows users to scrape and analyse their Instagram competitors / personal accounts, to monitor their growth and see how things are going and if they're on track with their growth targets.

For this specific portion of the website, one of my MySQL queries has been running really slow (4-6s each time).

I have two tables : - myaccounts (stores all the accounts each user is tracking) - accounts (stores all the raw data. Each account is scanned every 2 hours and a new record is added in here)

Here is the query that is running really slow:

SELECT M.type, M.status, M.id as masterid, A.*
    FROM myaccounts as M
    INNER JOIN accounts as A on M.accountid = A.userid
    WHERE A.id = (
        SELECT MAX(id)
        FROM accounts 
        WHERE userid = M.accountid)
    AND M.userid = ?
    AND M.status = 'active'

Using an EXPLAIN reveals that the subquery is actually going through every single row in the table (about 180k so far).

The reason I'm using MAX(id) is I want to display the very latest data for the accounts on the home page and give the option for users to access all the raw data on a another page.

Is there any way to speed this up? Some sort of INDEX on MAX()?

Thanks! :)

Mark

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Pr4w
  • 103
  • 10

4 Answers4

1

Unfortunately you won't be able to create an index on MAX() of a field since it's a value that has to be calculated at runtime. The only way you could do that is to maintain a new field that stores the max value and put an index on that. However, then you would need to make sure that field is updated every time a query runs that may change it's value. This would solve your performance problem, but it's generally ill advised to duplicate data like this as there's always the risk that the new field will become out of sync with the rest of the data.

Instead of doing this, I'd recommend doing away with the sub-query and using ORDER BY to order the full results by accounts.id. Then, use LIMIT 1 at the end of the query just to pick the top result. To be more concrete, the query would look something like this:

SELECT M.type, M.status, M.id as masterid, A.*
FROM myaccounts as M
INNER JOIN accounts as A on M.accountid = A.userid
WHERE M.userid = ?
AND M.status = 'active'
ORDER BY A.id DESC
LIMIT 1
Rob Streeting
  • 1,675
  • 3
  • 16
  • 27
0

You can use the below query:

SELECT M.type, M.status, M.id as masterid, A.*
FROM myaccounts as M
INNER JOIN
(
 SELECT MAX(id), userid FROM accounts GROUP BY userid
)
AS A
on M.accountid = A.userid
WHERE
M.userid = ?
AND M.status = 'active'
Helper
  • 776
  • 7
  • 17
0

By introducing a link table lnk you can provide a "map" of max(id)s for each userid. This table will not be (re-)generated for each record.

SELECT M.type, M.status, M.id as masterid, A.*
    FROM myaccounts as M
    INNER JOIN
     ( SELECT userid uid, MAX(id) mid FROM  accounts GROUP BY userid ) lnk
    ON uid=accountid
    INNER JOIN accounts ON id=mid AND userid=accountid
    WHERE
    M.userid = ?
    M.status = 'active'
Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
0

So, if two 'users' are tracking the same 'account', this query will compute the same info twice?

Turn the solution inside-out. First find the "last" entry for each account, then apply those to where you want them.

Also

myaccounts: INDEX(status, userid)   -- in either order
accounts:   INDEX(userid, id)       -- in this order
Rick James
  • 135,179
  • 13
  • 127
  • 222