using postgres-9.4 on ubuntu-15.10
Performing join on two tables on indexed column and get Seq Scan
table a
: ~7 million rows, contains char()&varchar()&float
, have clustered b-tree index on date char(11)
. 1~2500 rows on each date.
table b
: ~6k rows,contains 2 cols date char(11) and feature float[]
also have clustered b-tree index on date char(11)
. There is only 1 record on each date.
I have the following query, cols
in select
are too many and too ugly so I make it short:
SELECT
6 regexp_split_to_array(a.char_cols),
ARRAY[ a.all_float] || b.feature
FROM
a JOIN b ON
a.date = b.date;
It takes ~50000ms and use EXPLAIN ANALYZE VERBOSE
I have following result:
QUERY PLAN
-----------------------------------------------------------------------------
Hash Join (cost=501.67..790703.48 rows=7154354 width=856) (actual time=3.216..49223.885 rows=7154589 loops=1)
Output: A_LOT_MESS
Hash Cond: (a.date = b.date)
-> Seq Scan on public.a (cost=0.00..521913.54 rows=7154354 width=440) (actual time=0.001..967.771 rows=7154589 loops=1)
Output: EVERY_COLS_OF_A
-> Hash (cost=424.63..424.63 rows=6163 width=428) (actual time=3.157..3.157 rows=6163 loops=1)
Output: b.feature, b.date
Buckets: 1024 Batches: 1 Memory Usage: 2769kB
-> Seq Scan on public.b (cost=0.00..424.63 rows=6163 width=428) (actual time=0.003..1.156 rows=6163 loops=1)
Output: b.feature, b.date
Planning time: 1.041 ms
Execution time: 49396.419 ms
(12 rows)
I tried advices from PostgreSQL query runs faster with index scan, but engine chooses hash join ,
but neither set random_page_cost = 2
nor set work_mem = '2048MB'
can have any effect: plan and times remains the same.
I see some article says that bitmap index scan could do better, but I don't know how to create such index: It seems that postgres is the one who decide if it should be generated when querying something.
Additional information:
about resource usage:
query takes only one logical core to work, and I have enough memory(32GiB, when database size <6GiB)