2

Hi i am in need of help to optimize a query for large database records above 1 Millions . Current query is taking 27-30 seconds to execute.

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
owner_user.first_name AS ownerFirstName,
owner_user.last_name AS ownerLastName,
CONCAT(owner_user.last_name,
        owner_user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user AS owner_user ON candidate.owner = owner_user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

is there any method to reduce the execution time of the query. I have also added the index in tables but it is not working fine.

Indexes

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Vipin Singh
  • 532
  • 1
  • 7
  • 24
  • show the index you have on the tables involved .. – ScaisEdge May 09 '16 at 14:35
  • i have added screen shot of indexes. – Vipin Singh May 09 '16 at 14:47
  • can you also give us the explain statement result for the query ..? @VipinS – Andrews B Anthony May 09 '16 at 15:03
  • 1
    It's hilarious that your candidate table has a "is_hot" column. – Yaron Idan May 09 '16 at 15:04
  • I'd say your problem is somewhere here: `LEFT JOIN saved_list_entry ON saved_list_entry.data_item_type = 100 AND saved_list_entry.data_item_id = candidate.candidate_id`. Reverse the ordering of the clause and make sure that data_item_type is included in the data_item_id index. ie: `LEFT JOIN saved_list_entry ON saved_list_entry.data_item_id = candidate.candidate_id AND saved_list_entry.data_item_type = 100 --Make sure that this is indexed with data_item_id ` – Max Sorin May 09 '16 at 15:10
  • @YaronIdan I'm sure that means **exactly** what we think it means. – Max Sorin May 09 '16 at 15:12
  • Did you index every single column you could? – Mjh May 09 '16 at 15:24
  • Can you answer ... 1. Does a candidate always have a user?, 2. Does a candidate always have a saved_list_entry?, 3 Do you have an index on `saved_list_entry` for `data_item_id` and `data_item_type`? It's difficult to give advice on indexes if we don't know what your data is like. Also what is the ratio of active candidates to inactive candidates? You don't seem to have an index on the active flag, but it also might not make sense to add one if the data doesn't support adding one. – Ryan-Neal Mes May 09 '16 at 15:27
  • 1. Candidate has always a user 2. yes 3. yes in saved_list_entry both fields indexed – Vipin Singh May 09 '16 at 16:02
  • what table is your "Is_Active" coming from, you have no alias table reference to it. – DRapp May 09 '16 at 18:46

4 Answers4

1

You're using the query pattern

     SELECT a vast bunch of stuff
       FROM a complex assembly of JOIN operations
      ORDER BY some variable DESC
      LIMIT 0,small number

This is inherently inefficient: to satisfy your query the MySQL server must construct a vast result set, then it must sort the whole thing, then it takes the first fifteen rows and discards the rest.

To make this more efficient, you need to sort less stuff. Here's a way to do that. It looks like you want to find the most recently modified fifteen candidates. This query will, fairly cheaply, retrieve the IDs of those candidates. It exploits one of your indexes.

                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15

Then, you can use that as a subquery in your main query. Add a clause like this:

  WHERE candidate.candidate_id IN (
                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15)

to your query in the appropriate place.

Notice also that you're using a nonstandard and potentially harmful MySQL specific extension to GROUP BY. Your query works, but if a candidate has more than one owner, it will return just one after selecting it randomly.

Finally, you seem to have placed single-column indexes on many of the columns in your large table. This is a notorious SQL antipattern: all those indexes slow down INSERT and UPDATE operations, and most of them probably serve no purpose in speeding up a query. Certainly, for this query, the only useful indexes are the one on date_modified and the primary key.

Many complex queries are satisfied best using specific multi-column indexes. A bunch of single-column indexes are no help to such queries.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
1

I've changed changed the table alias in the below query, use this This Must Solve Your Problem

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
user.first_name AS ownerFirstName,
user.last_name AS ownerLastName,
CONCAT(user.last_name,
        user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user ON candidate.owner = user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

use the below queries to create indexes for join conditions

create index index_user user(user_id);

create index index_saved_list_entry saved_list_entry(data_item_type,data_item_id);

create index index_candidate candidate(is_active,candidate_id,dateModifiedSort);
Andrews B Anthony
  • 1,381
  • 9
  • 27
1

First, a candidate, I would suspect is always a single entry by the ID, so why you are doing a GROUP BY is beyond me, that can PROBABLY be removed and improve a little.

Second, you are doing a left-join to the "saved_list_entry" table, but not actually pulling any columns from it, so this could probably be removed completely.

Third, from consideration of the GROUP BY being no longer applicable, I would suggest updating your indexes as:

table             index
CANDIDATE         ( is_active, date_modified, candidate_id, owner )
user              ( user_id )
saved_list_entry  ( data_item_id, data_item_type )

Since your order is by the date modified in descending order, having IT in the second position to the is_active (Where condition), it will plow through your first 15 quickly. However, your SQL_CALC_FOUND_ROWS will still have to plow through all other qualifying, but the result set would be pre-ordered by the index to match.

SELECT SQL_CALC_FOUND_ROWS
      c.candidate_id AS candidateID,
      c.candidate_id AS exportID,
      c.is_hot AS isHot,
      c.date_modified AS dateModifiedSort,
      c.date_created AS dateCreatedSort,
      c.first_name AS firstName,
      c.last_name AS lastName,
      c.city AS city,
      c.state AS state,
      c.key_skills AS keySkills,
      u.first_name AS ownerFirstName,
      u.last_name AS ownerLastName,
      CONCAT(u.last_name, u.first_name) AS ownerSort,
      DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
      DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
      c.email2 AS email2 
   FROM
      candidate c
         LEFT JOIN user u
            ON c.owner = u.user_id
         LEFT JOIN saved_list_entry s
            ON c.candidate_id = s.data_item_id
            AND s.data_item_type = 100
   WHERE
      c.is_active = 1 
   GROUP BY 
      c.candidate_id 
   ORDER BY    
      c.date_modified DESC 
   LIMIT 
      0, 15
DRapp
  • 47,638
  • 12
  • 72
  • 142
1
  1. Get rid of saved_list_entry, it adds nothing.

  2. Delay joining to user. This will let you get rid of the GROUP BY, which is adding a bunch of time, and possibly inflating the value of FOUND_ROWS().

Something like:

SELECT  c2.*,
        ou.first_name AS ownerFirstName,
        ou.last_name AS ownerLastName,
        CONCAT(ou.last_name, ou.first_name) AS ownerSort,
    FROM  
      ( SELECT  SQL_CALC_FOUND_ROWS
                c.candidate_id AS candidateID, c.candidate_id AS exportID,
                c.is_hot AS isHot, c.date_modified AS dateModifiedSort,
                c.date_created AS dateCreatedSort, c.first_name AS firstName,
                c.last_name AS lastName, c.city AS city, c.state AS state,
                c.key_skills AS keySkills,
                DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
                DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
                c.email2 AS email2
            FROM  candidate AS c
            WHERE  is_active = 1
            GROUP BY  c.candidate_id
            ORDER BY  c.date_modified DESC  -- note change here
            LIMIT  0 , 15 
      ) AS c2
    LEFT JOIN  user AS ou  ON c2.owner = ou.user_id;

(I messed up the column order, but you can fix that.)

Index needed:

candidate:  INDEX(is_active, candidate_id, date_modified)
Rick James
  • 135,179
  • 13
  • 127
  • 222