0

Working on a small data set of less than 50 rows, the following query below works absolutely fine. When scaling this up and running the same query on a data set of around 5000 rows, this query takes around 3-6 seconds to run which is way too slow on a live environment.

What can be done to improve the performance of this query?

SELECT table1.ID, table1.CompanyName, 
(SELECT CompanyIDBeingFollowed FROM table2 WHERE PersonID = ? ) 
FROM table1 
JOIN table3 ON table1.ID = table3.ID  
WHERE table1.Status = 'Live'  AND  ( MATCH(table3.Content) AGAINST( '+search +term' IN BOOLEAN MODE ) )  
GROUP BY table1.CompanyID 
LIMIT 10;

In essence, the query above is searching through a large data set to rank the results using MySQL FullText so that the most relevant results are returned, i.e. the ones with the highest MySQL FullText Score. Then getting a few bits of information from two other tables based on what has matched, which is used to populate the useful content for the user on the page.

Thoughts?

I'm conscious that the current data set of around 5000 rows is going to grow into the hundreds of thousands in a short period of time, so I'd rather look at how to best optimise this query now rather than later.

On the development environment, the query runs instantly as this is on a data set of around 50 rows.

Update

I've just updated the SQL above to make it a bit easier to read and understand. Table summary below for reference.

Table 1 - Companies
 - ID
 - CompanyName

Table 2 - People Following Companies
 - CompanyIDBeingFollowed
 - PersonID

Table 3 - Pages On Company Website
 - ID (Unique)
 - CompanyID (Non-Unique)
 - Content
Michael Cropper
  • 872
  • 1
  • 10
  • 28
  • can you please post an instance of an actually query without all those quotes and + signs where the paramters have been bounds – e4c5 Jan 06 '17 at 12:31
  • And do post the structure of table1 and table2 as well. Add the explain. – e4c5 Jan 06 '17 at 12:31
  • @e4c5 Updated question – Michael Cropper Jan 06 '17 at 14:04
  • "can you please post an instance of an actually query without all those quotes and + signs where the paramters have been bounds " and the result of `show create table` would have been a lot more usefull for us and less typing for you – e4c5 Jan 06 '17 at 14:05
  • There is a hell of a lot of useless information in the full table that isn't relevant for this question, hence I've not included that to keep things easier to understand (hopefully that is the case anyhow :-) ) – Michael Cropper Jan 06 '17 at 14:11

2 Answers2

0

Difficult to answer your question, but if I where on your place I will try to do this:

  1. Use explain to check what's going on there
  2. Ensure that I realy need group by
  3. Ensure that I realy need `(SELECT ID FROM table2 WHERE ID = ? ) as subquery
  4. Ensure that I realy need order by

And also, may be, it will be acceptable to use 'like' search instead of full text search in case, when user filter data by one word.

degr
  • 1,559
  • 1
  • 19
  • 37
  • 1) Explain didn't really give me any useful information. It just shows the query and how it is broken down, not why it is taking a while to run. 2) The reason I need GROUP BY currently is that because of the JOIN between Table1 and Table 3, Table 1 contains a single unique identifier, ID, and Table3 contains multiple references to this, ID in the form of CompanyID which is used in multiple rows, each row representing a page on the CompanyWebsite. I'm thinking out loud here,would I be able to get rid of the GROUP BY if I make the CompanyID in Table3 a Foreign Key of ID for Table 1? – Michael Cropper Jan 06 '17 at 14:08
  • 3) Yes I do unfortunately. 4) No I don't need that actually as MySQL FULLTEXT search automatically returns a prioritised list so I've deleted that. Interestingly, single word queries on a FULLTEXT search run slower than multiple word queries, so using LIKE would certainly speed that up, good idea. @degr – Michael Cropper Jan 06 '17 at 14:10
  • `(SELECT CompanyIDBeingFollowed FROM table2 WHERE PersonID = ? )` look like it should return same result for all rows, so I think you can do it in different query – degr Jan 06 '17 at 16:03
0

This was taking around 3-6 seconds to run (code from original question);

SELECT table1.ID, table1.CompanyName, 
(SELECT CompanyIDBeingFollowed FROM table2 WHERE PersonID = ? ) 
FROM table1 
JOIN table3 ON table1.ID = table3.ID  
WHERE table1.Status = 'Live'  AND  ( MATCH(table3.Content) AGAINST( '+search +term' IN BOOLEAN MODE ) )  
GROUP BY table1.CompanyID 
LIMIT 10;

Whereas this code takes less than 1 second to run (as would be expected) - Removed the GROUP BY and used DISTINCT instead;

SELECT DISTINCT(table1.ID), table1.CompanyName, 
(SELECT CompanyIDBeingFollowed FROM table2 WHERE PersonID = ? ) 
FROM table1 
JOIN table3 ON table1.ID = table3.ID  
WHERE table1.Status = 'Live'  AND  ( MATCH(table3.Content) AGAINST( '+search +term' IN BOOLEAN MODE ) )  
LIMIT 10;

I've no idea why this is the case in this scenario, but it works. Would be great if anyone knows the more technical aspects of WHY this is the case and I'll update the answer.

Thanks for pointers all.

Michael Cropper
  • 872
  • 1
  • 10
  • 28