2

I am using

SELECT * 
  FROM picture 
 INNER JOIN user ON picture.fbid = user.fbid 
 WHERE hide = 0 
   AND userhide = 0 
 ORDER BY gender, RAND( ) LIMIT 0 , 1

This executes and gives me the link to a picture of a random row where the gender is female. It takes around 15 seconds to execute.

How could I change the SQL so that it is calculated fast? Should I add a gender column in the Picture table so that it doesn't combine both the table and then gets a random row or is there some other way to optimise the SQL?

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
Yahoo
  • 4,093
  • 17
  • 59
  • 85
  • Why is this tagged `[mysql]` and `[sql-server]`? Which is it and what indexes do you have and what is the current explain plan? – Ben Aug 26 '12 at 10:15
  • 2
    BTW, there's no guarantee that the row returned will have the gender as female... You would need `and user.gender = 'female'` in the WHERE clause. – Ben Aug 26 '12 at 10:17
  • @Ben - Yeah , But since it orders it with females first so it kind of worked . Ya the SQL is not perfect and it takes a lot of time to execute – Yahoo Aug 26 '12 at 10:19
  • 1
    http://akinas.com/pages/en/blog/mysql_random_row/ – eggyal Aug 26 '12 at 10:35

2 Answers2

3

[EDIT] You can do the same with a MySQL stored procedure, I just coded it here: http://sqlfiddle.com/#!2/d0e6a/2

(Note: Not having the count done in every query would aliviate the load, otherwise in this particular case it can be even slower and I would recommend to keep your query, but with index optimization. Or check this out: Is cross-table indexing possible?)

[EDIT 2] Here's another example with index optimization using an average of 50000 random entries in each table: http://sqlfiddle.com/#!2/bfbe1/1 The query takes less than 100ms (Without indexes I left it running for several minutes and still no result!). You can create indexes on existing tables using CREATE INDEX

Here's my solution using 2 queries from a PHP script:

$sql="SELECT COUNT(*) 
  FROM picture 
 INNER JOIN user ON picture.fbid = user.fbid 
 WHERE hide = 0 
   AND userhide = 0
   AND gender = 'female' 
   GROUP BY gender";

$result=mysql_query($sql);
$row=mysql_fetch_array($result);

$sql="SELECT * 
      FROM picture 
      INNER JOIN user ON picture.fbid = user.fbid 
      WHERE hide = 0 
      AND userhide = 0
      AND gender = 'female' 
      LIMIT FLOOR(RAND()*$row['COUNT(*)']),1";

$result=mysql_query($sql);
$picture=mysql_fetch_array($result);

This should make you query a lot faster.

Also, if your tables are big (which I guess it's the case, since the query takes 15 secs) it would help if the conditional/join fields are indexed.

Community
  • 1
  • 1
NotGaeL
  • 8,344
  • 5
  • 40
  • 70
  • added a solution as an stored procedure so you can avoid sending more than one query, check that out ;) – NotGaeL Aug 26 '12 at 11:56
  • 50000 records in SQL Fiddle? Gah! It's not for raw performance testing (since that is subject to many different variables); SQL Fiddle is supposed to help you craft the structure of your query, as a test case. Relative performance can be judged by comparing execution plans. I think I'm going to need to set a hard limit in the MySQL database size, to prevent things like this - it's just too hard on the servers, if everyone does it this way. – Jake Feasel Aug 27 '12 at 16:01
  • You're right, I'll keep that in mind. Anyway I've also made some "actual" RAW tests with uniformely distributed random records (about 50 million on each table) on a virtualbox (1 GHz 1 core, 1GB of RAM) on my laptop. Then I discovered the `count(*)` query + `limit floor(rand()*count),1` query is not much of an optimization in this case: It takes between 200 and 400 seconds to process on this virtualbox computer versus an stable 200 and something for the `order by rand()` solution, which I guess depends on how many records match the query (average of 3125000 in this case). – NotGaeL Aug 28 '12 at 10:15
  • (Still I kept my answer for future edit where I will suggest to keep an estimated count to aliviate the load, but I don't know on which basis the first part of the stored procedure should be triggered to have the count updated. Any ideas?) – NotGaeL Aug 28 '12 at 10:20
2

I'd recommend creating an INDEX on picture.fbid = user.fbid this will help to speed the JOIN on the table.

More Mysql specific information here: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

CodeTalk
  • 3,571
  • 16
  • 57
  • 92