0

My query:

DROP TABLE IF EXISTS tmp;
CREATE TEMP TABLE tmp AS SELECT *, ST_BUFFER(the_geom::GEOGRAPHY, 3000)::GEOMETRY AS buffer FROM af_modis_master LIMIT 20000;
CREATE INDEX idx_tmp_the_geom ON tmp USING gist(buffer); 
EXPLAIN SELECT (DUMP(ST_UNION(buffer))).path[1], (DUMP(ST_UNION(buffer))).geom FROM tmp;

Output from EXPLAIN:

Aggregate  (cost=1705.52..1705.54 rows=1 width=32)
  ->  Seq Scan on tmp  (cost=0.00..1625.01 rows=16101 width=32)

Seq Scan means it is not using the index, right? Why not?

(This question was first posted here: https://gis.stackexchange.com/questions/51877/postgis-query-not-using-gist-index-when-doing-a-st-dumpst-union . Apologies for reposting but the community here is much more active, so perhaps wil provide an answer quicker.)

UPDATE: Even adding a where clause that filters based on the buffer causes a Seq Scan:

ANALYZE tmp;
EXPLAIN SELECT (DUMP(ST_UNION(buffer))).path[1], (DUMP(ST_UNION(buffer))).geom FROM tmp WHERE ST_XMIN(buffer) = 0.0;
Community
  • 1
  • 1
Ries
  • 2,844
  • 4
  • 32
  • 45
  • Have you updated the statistics before running the query? – Szymon Lipiński Feb 14 '13 at 13:32
  • No, but when I do "ANALYZE tmp;" right after creating the index I still get Seq Scan from EXPLAIN. – Ries Feb 14 '13 at 13:56
  • I looks like you want to run st_union on all of the rows in the table. Most probably Postgres thinks that using index won't make it faster. The index will be used when you will be choosing some geometries first, in where clause. – Szymon Lipiński Feb 14 '13 at 14:12
  • Ok, makes sense. Interesting though that doing the analyze step seems to improve the execution time from ~15 sec to ~11 sec. – Ries Feb 14 '13 at 14:18
  • I tried testing your hypothesis by adding a where clause: ... FROM tmp WHERE ST_XMIN(buffer) = 0.0; I still get a Seq Scan from explain. – Ries Feb 15 '13 at 08:04
  • don't cross post: http://gis.stackexchange.com/questions/51877/postgis-query-not-using-gist-index-when-doing-a-st-dumpst-union – Mike T Feb 16 '13 at 23:05

1 Answers1

0

A query like you have will never use an index ever. To do so would substitute significant random disk I/O (possibly even in addition to normal disk I/O) for the scan of the table.

In essence you are not selecting on criteria so an index will be slower than just pulling the data from disk in physical order and processing it.

Now if you pull only a single row with a where condition your index might help with, then you may find it may use the index, or not, depending on how big the table is. Very small tables will never use indexes because the extra random disk I/O is never a win. Remember no query plan beats a sequential scan through a single page....

Chris Travers
  • 25,424
  • 6
  • 65
  • 182