2

I've written the following query:

WITH m2 AS (
    SELECT m.id, m.original_title, m.votes, l.name as lang
    FROM movies m
    JOIN movie_languages ml ON m.id = ml.movie_id 
    JOIN languages l ON l.id = ml.language_id
)
SELECT m.original_title 
FROM movies m
WHERE NOT EXISTS (
    SELECT 1
    FROM m2
    WHERE m.id = m2.id AND m2.lang <> 'English'
)

The results appear after 1.5 seconds.

After adding the following line at the end of the query, it takes at least 5 minutes to run it:

ORDER BY votes DESC;

It's not the size of the data, as ORDER BY on the entire table return results in notime.

What am I doing wrong? Why is the ORDER BY adds so much time? (The query SELECT * FROM movies ORDER BY votes DESC returns immediately).

Yam Mesicka
  • 6,243
  • 7
  • 45
  • 64

4 Answers4

4

The order by in the CTE is irrelevant. But I would suggest aggregation for this purpose:

SELECT m.original_title
FROM movies m JOIN
     movie_languages ml
     ON m.id = ml.movie_id JOIN
     languages l
     ON l.id = ml.language_id
GROUP BY m.original_title, m.id
HAVING SUM(lang = 'English') = 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

In order to examine your queries you may turn on the timer by entering .time on at the SQLite prompt. More importantly utilize the EXPLAIN function to see details on your query.

The query initially written does seem to be rather more complex than necessary as already pointed out above. It does not seem apparent what the necessity is for 'movie_languages' and 'languages' tables in general, but especially in this particular query. That would require more explanation on your part but I believe at least one could be removed thus speeding up your query.

The ORDER BY clause in SQLite is handled as described below.

SQLite attempts to use an index to satisfy the ORDER BY clause of a query when possible. When faced with the choice of using an index to satisfy WHERE clause constraints or satisfying an ORDER BY clause, SQLite does the same cost analysis described above and chooses the index that it believes will result in the fastest answer.

SQLite will also attempt to use indices to help satisfy GROUP BY clauses and the DISTINCT keyword. If the nested loops of the join can be arranged such that rows that are equivalent for the GROUP BY or for the DISTINCT are consecutive, then the GROUP BY or DISTINCT logic can determine if the current row is part of the same group or if the current row is distinct simply by comparing the current row to the previous row. This can be much faster than the alternative of comparing each row to all prior rows.

Since there is no index or type on votes stated and the above logic may be followed thus choosing 'the index that it believes will result in the fastest answer'. With the over-complicated query and no index on votes which is being used as ORDER BY then there is much more for it to figure out than necessary. Since the simple query with ORDER BY executes then the complexity of the query causing SQLite much more to compute than necessary.

Additionally the type of the column, most likely INTEGER, is important when sorting (and joining). Attempting to sort on a character type will not only get you wrong results in this case if votes end up above single digits it would be the wrong type to use (I'm not assuming you are just mentioning it).

So simplify the query, ensure your PRIMARY KEYS are properly set, and test it. If it is still not returning in time try an index on votes. This will give you much better insight into what is going on and how different changes affect your queries.

SQLite Documentation - check all and note 6. Sorting, Grouping and Compound SELECTs

SQLite Documentation - check 10. ORDER BY optimizations

sativay
  • 162
  • 11
1

You can do it with NOT EXISTS, without joins and aggregation (assuming that there is always at least 1 row for each movie in the table movie_languages):

SELECT m.*
FROM movies m
WHERE NOT EXISTS (
  SELECT 1 FROM movie_languages ml
  WHERE m.id = ml.movie_id
    AND ml.language_id <> (SELECT l.id FROM languages l WHERE l.lang = 'English')
)
ORDER BY m.votes DESC

or with a LEFT join to languages to get the unmatched rows:

SELECT m.*
FROM movies m 
INNER JOIN movie_languages ml ON m.id = ml.movie_id 
LEFT JOIN languages l ON l.id = ml.language_id AND l.lang <> 'English'
WHERE l.id IS NULL
ORDER BY m.votes DESC
forpas
  • 160,666
  • 10
  • 38
  • 76
0

Refer to this link for more information:

here

In a nutshell, When you include an order by clause, the database builds a list of the rows in the correct order and then returns the data in that order.

The creation of the list mentioned above takes a lot of extra processing, translating into a longer execution time.

Ivan
  • 399
  • 2
  • 5
  • 18