4

I have read that joins are better than subqueries.

But

EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
    FROM Queue
    INNER JOIN LastQueue
    ON Queue.Id=LastQueue.Id

gives

Array
(
    [0] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => SCAN TABLE Queue (~1000000 rows)
        )

    [1] => Array
        (
            [selectid] => 0
            [order] => 1
            [from] => 1
            [detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        )

)

while

EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
    FROM Queue
    WHERE (SELECT 1 FROM LastQueue WHERE Queue.Id=LastQueue.Id) IS NOT NULL

gives

Array
(
    [0] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => SCAN TABLE Queue (~500000 rows)
        )

    [1] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => EXECUTE CORRELATED SCALAR SUBQUERY 1
        )

    [2] => Array
        (
            [selectid] => 1
            [order] => 0
            [from] => 0
            [detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        )

)

I think I have to multiply the number of rows to get an idea of cost. I am right?

Then,

  • Using join: 1000000*1
  • Using subquery: 500000*1*1

Then, is the subquery is faster than join?

Is there a good tutorial to learn how to understand EXPLAIN/EXPLAIN QUERY PLAN results?

And why does it say that SCAN TABLE Queue is ~1000000 and ~500000, when that table has 76 rows?

Oriol
  • 274,082
  • 63
  • 437
  • 513
  • 1
    Here is a link(**https://www.sqlite.org/eqp.html**) to official document for **Explain Query Plan**.It explains all you need to know to use the Explain Query Plan – Durai Amuthan.H Oct 04 '13 at 09:14

2 Answers2

8

The line EXECUTE CORRELATED SCALAR SUBQUERY 1 is just there because you're using a different query syntax.

The actual execution is the same in both cases: SQLite goes through all records of the Queue table, and tries to look up the corresponding record in the LastQueue table.

That the estimated record counts are different is of no concern to you, because you know that the actual number of records is the same.

Joins might be better than subqueries in other databases, but in SQLite, which uses only nested loop joins, the only difference is that a join allows SQLite to choose the outer and the inner table in the join.

In any case, you should write the query in the most simple and maintainable way, and optimize it only if you have measured that you get a noticeable and necessary improvement.


Please note that instead of:

WHERE (SELECT ...) IS NOT NULL

it would be more idiomatic to write:

WHERE EXISTS (SELECT ...)
CL.
  • 173,858
  • 17
  • 217
  • 259
  • *in SQLite, which uses only nested loop joins, the only difference is that a join allows SQLite to choose the outer and the inner table in the join.* Another difference is probably `JOIN` enables the use of existing indices, while the result of a subquery must always be fully scanned. – nalzok Aug 07 '20 at 06:51
  • @nalzok As shown in the EXPLAIN QUERY PLAN output in the question, SQLite *does* use an index for the subquery lookup. And a scalar subquery returns exactly one result. And if you use something like `... WHERE x IN (...non-correlated subquery...)`, then the subquery result will be stored as an index. – CL. Aug 07 '20 at 08:19
  • Oh, you are right. As of SQLite version 3.31.1, apparently there will be an `AUTOMATIC COVERING INDEX` if the result of a non-correlated list query has more than 7 rows. When it contains less or equal to 7 rows, the subquery will be re-executed for each row, which means not only an index won't be created, also no transient tables will be materialized. See also on [DBA.SE](https://dba.stackexchange.com/questions/273294/correlated-subquery-with-common-table-expressions-in-sqlite). – nalzok Aug 07 '20 at 08:52
1

the explain is giving inacurrate counts for your tables. Try using analyze to collect stats on tables, then try your explain again.

I think you will find both queries will return at almost the exact same speed. Maybe the subquery is slower due to an extra step. The important thing to notice here is "SCAN TABLE" which means its looking at all the rows on disk.

http://www.sqlite.org/lang_analyze.html

Also is it possible there are no indexes on your table? Because it should be using them but it appears it is not. Make sure you use primary key on your create table statement.

http://www.sqlite.org/lang_createtable.html

beiller
  • 3,105
  • 1
  • 11
  • 19
  • After analyze, join is 76*1 and subquery is 38*1*1 rows. And yes, I have a primary key (`CREATE TABLE 'Queue' ('Id' INTEGER PRIMARY KEY NOT NULL)`). But I am right multiplying the numbers of rows, or should I do other operations to calculate the cost? – Oriol Jul 15 '13 at 20:41
  • Why do you think that SQLite could use another index? For a join, it *must* look at all `Queue` records. – CL. Jul 16 '13 at 06:44
  • @CL. not another index, just 1 index (ie PK), and on LastQueue table because it is an inner join. – beiller Jul 16 '13 at 17:39
  • @beiller For what operation do you think an index on the join's outer table would be helpful? – CL. Jul 16 '13 at 19:33