1

I have two tables:

Restaurants and SurveyInvitation.

One restaurants has many survey invitation.

I want to select all the restaurants that have a survey invitation and the status of it is 'approved', 'completed', 'hidden_review' .

The restaurants table has ~1400 rows and the survey invitation ~2.4 milion rows.

This is my query

SELECT  `Restaurant`.`id`
FROM   `restaurants` AS `Restaurant`
RIGHT JOIN `survey_invitations` AS `SurveyInvitations`
           ON ( `SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
                AND `SurveyInvitations`.`status` 
                IN (
                    'approved', 'completed', 'hidden_review'
                   ) 
               )
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

This runs in 1.235 sec.

Running explain gives

https://jsfiddle.net/bjuepb9j/3

I also tried this but no luck still 1.2 sec

SELECT  `Restaurant`.`id`
FROM   `db_portal`.`restaurants` AS `Restaurant`
RIGHT JOIN  (
    select `restaurant_id` from `survey_invitations` AS `SurveyInvitations`
    where `SurveyInvitations`.`status` 
    IN ('approved', 'hidden_review', 'completed')
)  AS `SurveyInvitations`
ON (
`SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
)
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

Explain is the same.

In the fiddle there is also the result from show index on both of the tables.

1.2 sec for a ~2.4 million rows is to much I think. Maybe the indexes are wrong, I'm not that good at this kind of stuff.

Edit.1. https://jsfiddle.net/bjuepb9j/6/

Has show create table and show columns of survey_invitations

Fox Alex
  • 133
  • 2
  • 9
  • Can you share your table create statements with indexes and (primary/unique keys)? – Dániel Kis Nov 25 '17 at 17:19
  • Why are you using a `RIGHT JOIN`? With a right join, the table on the right 's join criteria will never be null (because null does not equal null), which means the `Restaurants` table is optional, but then you negate that in the `where` clause by doing equality checks against the `Restaraunts` table (country_id and area_id). I think an inner join would be simpler. Try that and see how it impacts performance. (inner joins offer more options for optimization) – Brandon Nov 25 '17 at 17:19

2 Answers2

1

Use exists:

SELECT r.id
FROM restaurants r
WHERE r.country_id = 53 AND
      r.area_id IN (1, 16, 27, 118, 219, 221, 222, 223,
                    224, 225, 230, 231, 235, 236, 237, 238,
                    239, 240, 248, 226, 241, 244, 246, 227,
                    245, 228, 229, 242, 243, 249
                   ) AND
      EXISTS (SELECT 1
              FROM survey_invitations si
              WHERE si.restaurant_id = r.id AND
                    si.status IN ('approved', 'completed', 'hidden_review') 
             );

Then, for this query you want indexes on restaurants(country_id, area_id, id) and survey_invitations(restaurant_id, status).

A right join is entirely unneeded for your query. The where clause turn it into an inner join anyway. In all likelihood, the expense of the query is in the group by. This version eliminates that.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

I would recommend replacing the join with an IN subquery, and not an EXISTS subquery. When writing the query with an IN subquery, you avoid the correlated EXISTS query, which can occasionally be slower (depending on the amount of results). Try this:

SELECT
        r.id 
    FROM
        restaurants r 
    WHERE
        r.country_id = 53 
        AND r.area_id IN (
            1, 16, 27, 118, 219, 221, 222, 223, 224, 225, 230, 231, 235, 236, 237, 238, 239, 240, 248, 226, 241, 244, 246, 227, 245, 228, 229, 242, 243, 249
        ) 
        AND r.id IN (
            (
                SELECT
                    si.restaurant_id 
                FROM
                    survey_invitations si 
                WHERE
                    1 = 1 
                    AND si.status IN (
                        'approved', 'completed', 'hidden_review'
                    )
            )
        )

For this query, add these indexes:

ALTER TABLE `restaurants` ADD INDEX `restaurants_index_1` (`country_id`, `area_id`, `id`); 
ALTER TABLE `survey_invitations` ADD INDEX `survey_invitations_index_1` (`restaurant_id`, `status`);
Tom Shir
  • 462
  • 1
  • 3
  • 14