3

I have a query for a chat site I built ages ago and due to large volumes of traffic my poor query design has caught up with me. Here I have an example from my long query log:

SELECT  DISTINCT user.id 
FROM    user 
        STRAIGHT_JOIN user_pics 
            ON user.id=user_pics.uid      
        STRAIGHT_JOIN user_account 
            ON user_account.user_id=user.id 
WHERE   registered = 1  AND 
        user.id<>0 AND 
        user.id<>23847 AND 
        user.id<>12392... (IT HAS LITERALLY 1000 OF THESE)
        AND user_pics.main=1 AND 
        user_pics.approved=1 AND 
        user_pics.deleted<>1 AND 
        gender LIKE '%female%' AND 
        country LIKE '%United Kingdom%' AND 
        city LIKE '%birmingham%' AND 
        sexorientation     LIKE '%Straight%' 
ORDER   BY updatedate DESC 
LIMIT   20;

The query takes about 15 seconds to execute, I have Indexed all the columns of reference as well. Would replacing the 1000 "AND user.id<>0" marks with a lookup into a temp table improve the query. I thought I would ask before going and making the changes. If you can recommend any helpful changes with code I would be hugely grateful.

EDIT: The "user.id<>23847" marks are created in php by a simple select and then a foreach array loop adding them to the larger sql query.

EDIT 2: Thank you for all the help, by using "not in" they query was reduced from 13 seconds to 0.3 seconds.

David
  • 3,927
  • 6
  • 30
  • 48
  • You can change `user.id <>0 AND user.id <> 23847 AND ...` with `user.id NOT IN (0, 23847, ...)` this also executes faster. – Boynux Feb 26 '13 at 09:22
  • Why do you have all the user.id <> 2243 bits? – cja Feb 26 '13 at 09:23
  • It depends on number of items in `NOT IN` and complexity of REGEXP. Not realy a straight answer to this. – Boynux Feb 26 '13 at 09:28
  • 3
    Do you need `like` checks for things like gender and city? They would probably be better off as exact matches I'd have thought. – BenOfTheNorth Feb 26 '13 at 09:28
  • For gender and orientation, storing a string is waaaaay redundant too.. See my answer. – ppeterka Feb 26 '13 at 09:30
  • Using a leading wildcard in the LIKE clauses will prevent it using indexes on those columns (if there are loads of these parameters for a user they might be better off split into a separate table). Using a temp table will almost certainly improve the performance rather than having 1000s of ANDed clauses, but it does depend a bit on how efficiently you populate the temp table. – Kickstart Feb 26 '13 at 09:43
  • Is there any relationship between your ids? – artragis Feb 26 '13 at 09:44
  • @David I think I got an even better solution... – ppeterka Feb 26 '13 at 09:59

6 Answers6

5

Try yo use EXPLAIN
http://dev.mysql.com/doc/refman/5.0/en/explain.html
And you will see what can be better.

This can be replaced as user.id NOT IN(23847 ,0 , 23847 ,...) using NOT IN()

Oyeme
  • 11,088
  • 4
  • 42
  • 65
1

Yes, if you were to use,

user.id NOT IN (SELECT id FROM idExemptTable)

this would be much faster than individually checking each id

I've included a link to another answer that goes into further detail regarding the IN statement: -

SQL: SELECT IN faster and best practice?

I'm also unsure why you're using a wildcard match and LIKE for checking country name.

Community
  • 1
  • 1
ed_me
  • 3,338
  • 2
  • 24
  • 34
1

Instead of using the slow not-equals-to, use a trick: select those that are not to be selected, join it back to the original table with LEFT JOIN, and get only the rest via filtering:

SELECT DISTINCT user.id 
FROM user 
STRAIGHT_JOIN user_pics ON user.id=user_pics.uid
STRAIGHT_JOIN user_account ON user_account.user_id=user.id 
LEFT OUTER JOIN 
 (SELECT u.id from user u where u.id in (0,23847, 12397 ... ... ...)) as notToBeIncluded ON user.id=notToBeIncluded.id -- the users that are to be excluded
WHERE registered=1  
AND notToBeIncluded.id IS NULL --this is the important part.
AND user_pics.main=1 
AND user_pics.approved=1 
AND user_pics.deleted<>1 
AND gender LIKE '%female%' 
AND country LIKE '%United Kingdom%' 
AND city LIKE '%birmingham%' 
AND sexorientation     LIKE '%Straight%' 
ORDER BY updatedate DESC LIMIT 20;

EDIT How dumb I am... You even mentioned, that what the unwanted IDs you have are the results of another query! In this case, don't get the results from that in PHP, use it directly in your query! That will make it definitely quicker.

So:

  • remove other query along with the foreach processing its results
  • rewrite the main query

(there needs to be a string here for markdown to format things right)

SELECT DISTINCT user.id 
FROM user 
STRAIGHT_JOIN user_pics ON user.id=user_pics.uid
STRAIGHT_JOIN user_account ON user_account.user_id=user.id 
LEFT OUTER JOIN 
 (SELECT u.id from user <[ your other query here ]> ) as notToBeIncluded ON user.id=notToBeIncluded.id -- the users that are to be excluded
WHERE registered=1  
AND notToBeIncluded.id IS NULL --this is the important part.
--( conditions removed for brewity)
ORDER BY updatedate DESC LIMIT 20;

Other recommendations:

  • Using varchar for columns like gender, and orientation doesn't help. Use numeric values, that helps some.
  • also, review your indexes. Somebody suggested the use of execution plans: that is the ultimate way to go ehen trying to find out performance bottlenecks of a particular query.
ppeterka
  • 20,583
  • 6
  • 63
  • 78
0

Add an extra field to the users table and index it. Set the value to 1 for every user in

user.id<>0 AND 
user.id<>23847 AND 
user.id<>12392...

and to 0 for every other user.

Then filter by this field in your query above.

cja
  • 9,512
  • 21
  • 75
  • 129
0

Minor variation on ppeterkas solution, assuming the use of a temp table with the users who are not required in it. Also changing the LIKEs to straight equals (even better to use flags, or even bit strings depending on if the fields store combinations of values).

SELECT  DISTINCT user.id 
FROM    user 
        STRAIGHT_JOIN user_pics ON user.id=user_pics.uid      
        STRAIGHT_JOIN user_account ON user_account.user_id=user.id 
        LEFT OUTER JOIN tmp_users_to_ignore ON user.id = tmp_users_to_ignore.id
WHERE   registered = 1   
AND     tmp_users_to_ignore.id IS NULL 
AND     user_pics.main=1  
AND     user_pics.approved=1  
AND     user_pics.deleted<>1  
AND     gender = 'female'  
AND     country = 'United Kingdom'  
AND     city = 'birmingham'  
AND     sexorientation     = 'Straight' 
ORDER   BY updatedate DESC 
LIMIT   20;
Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • I like the temp table approach, but I think OP wants the unwanted people list to be dynamic - he even mentioned in his question that he fetches them using a query and processing it in PHP... – ppeterka Feb 26 '13 at 10:00
  • Aye, but he also asked about a temp table. Temp table will be pretty dynamic (given that once the script finishes the temp table is lost) but is possibly better if the derivation of the list of users is fairly complex (ie, maybe several lists combined). Also means the list can be used in other possible queries in their script. – Kickstart Feb 26 '13 at 10:13
0

I see :

EDIT: The "user.id<>23847" marks are created in php by a simple select and then a foreach array loop adding them to the larger sql query.

So why just don't create a subquery?

Let's imagine your first query is SELECT * FROM user WHERE yourcondition. Do you need all the datas return? if no, just do it in your second query :

SELECT DISTINCT user.id 
FROM user 
STRAIGHT_JOIN user_pics ON user.id=user_pics.uid
STRAIGHT_JOIN user_account ON user_account.user_id=user.id 
WHERE registered=1  
AND user.id NOT IN(SELECT id FROM user WHERE yourcondition)
AND user_pics.main=1 
AND user_pics.approved=1 
AND user_pics.deleted<>1 
AND gender ='female' 
AND country LIKE '%United Kingdom%' 

AND city LIKE '%birmingham%' AND sexorientation LIKE '%Straight%' ORDER BY updatedate DESC LIMIT 20;

artragis
  • 3,677
  • 1
  • 18
  • 30