Why does PostgreSQL use sequential scan instead of index scan?

Question

I'm learning PostgreSQL and I'm trying to understand the details of how to choose how to scan this database.

I have postgtresql 14.2 and run the following code:

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

And I see:

                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.11 rows=2 width=36) (actual time=152.718..152.728 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 2756021
 Planning Time: 0.218 ms
 Execution Time: 152.756 ms
(6 rows)

According to this Query Plan, in case of seq scan PG needs to read 2 756 021 rows until PG reaches the first desired row. As I understand, In case of index scan, PG needs to find only 1 row from index (ID = 2756021) and recieve the next index value (it should be light operation because index is sorted list) and load rows related with these indexes twice. But PG Planner choices Seq Scan.

I turned off seq scan and obtained Execution time is 0.116 ms instead of 152.756 ms in the first case:

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.62 rows=2 width=36) (actual time=0.060..0.072 rows=2 loops=1)
   ->  Index Scan using idx_id on t_test  (cost=0.43..124449.76 rows=1333333 width=36) (actual time=0.055..0.059 rows=2 loops=1)
         Index Cond: (id > 2756021)
 Planning Time: 0.342 ms
 Execution Time: 0.116 ms
(5 rows)

But to be honest, the following queries wit seq scan have better results that index scan (but I don't understand why):

SET enable_seqscan = TRUE ;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.12 rows=2 width=9) (actual time=0.042..0.054 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1242848 width=9) (actual time=0.038..0.041 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 261
 Planning Time: 0.076 ms
 Execution Time: 0.084 ms
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 1758121 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.06 rows=2 width=9) (actual time=0.018..0.032 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=2233890 width=9) (actual time=0.014..0.018 rows=2 loops=1)
         Filter: (id > 1758121)
 Planning Time: 0.092 ms
 Execution Time: 0.071 ms
(5 rows)

Why does PG use Seq Scan instead of Index Scan?

You probably didn'r run `ANALYZE` before you ran the first query, and you didn't give autovacuum time to do it for you. Otherwise it wouldn't have chosen the slow sequential scan. — Laurenz Albe, May 22 '23 at 09:07
Answer to the seconds part: If you set the predicate `id > N` to a low value compared to `min(id)` from the table you will see a very low value in `Rows Removed by Filter: X` (e.g. 261) or no such line at all. Than **is** `Seq Scan` realy better that `Index Scan` as you get the result from the 1st table block. Index requires 2 or 3 block access. Possible cause `delete` and `insert` with new sequences but keeping the filter limit from the initial case. — Marmite Bomber, May 22 '23 at 14:04
Can reproduce with a stable statistics @LaurenzAlbe, seems that in this query the `LIMIT 2` is not considered in the cardinality estimation. — Marmite Bomber, May 22 '23 at 14:07

score 1 · Accepted Answer · answered May 22 '23 at 13:53

The apparent problem with the first explain plan is in the second line

 ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)

PostgreSQL optimizer estimates that the predicate id > 2756021 will return 1.3M rows (rows=1333333), so the LIMIT 2 is not considered in the estimation and Seq Scan is choosen.

If you help PostgreSQL with an upper limit, the plan is fine using an index.

Example

EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 and id <= 2756021+2 LIMIT 2;

Limit  (cost=0.43..8.47 rows=2 width=9) (actual time=0.041..0.043 rows=2 loops=1)
  ->  Index Scan using idx_id on t_test  (cost=0.43..8.47 rows=2 width=9) (actual time=0.040..0.041 rows=2 loops=1)
        Index Cond: ((id > 2756021) AND (id <= 2756023))
Planning Time: 0.130 ms
Execution Time: 0.066 ms

Rabban Keyak · Answer 2 · 2023-05-22T12:40:41.147

0

I believe planner doesn't using index scan because you are requesting significant amount of records from the table (id > 2756201 means "about 1/3 of your table"). Planner decides that index scan (and access to 1/3 of rows by index after that) will be slower than reading all the table. To confirm, try to use greater values for id in where clause: id > 3256201, id > 3756201 etc...

edited May 22 '23 at 12:40

answered May 22 '23 at 09:58

Rabban Keyak

220
2
6

Why does PostgreSQL use sequential scan instead of index scan?

2 Answers2