I have a table with > 4.5 million rows and my SELECT
query is far too slow for my needs.
The table is created with:
CREATE TABLE all_legs (
carrier TEXT,
dep_hub TEXT,
arr_hub TEXT,
dep_dt TIMESTAMP WITH TIME ZONE,
arr_dt TIMESTAMP WITH TIME ZONE,
price_ct INTEGER,
... 5 more cols ...,
PRIMARY KEY (carrier, dep_hub, arr_hub, dep_dt, arr_dt, ...3 other cols...)
)
When I want to SELECT
all rows for a certain date, the query is too slow; it takes between 12sec and 20 seconds. My aim is that it takes max 1 sec. I expect the query to return between 0.1% and 1% of the rows contained in the table.
The query is quite simple:
SELECT * FROM all_legs WHERE dep_dt::date = '2017-08-15' ORDER BY price_ct ASC
EXPLAIN ANALYZE
returns:
Sort (cost=197154.69..197212.14 rows=22982 width=696) (actual time=14857.300..14890.565 rows=31074 loops=1)
Sort Key: price_ct
Sort Method: external merge Disk: 5256kB
-> Seq Scan on all_legs (cost=0.00..188419.85 rows=22982 width=696) (actual time=196.738..14581.143 rows=31074 loops=1)
Filter: ((dep_dt)::date = '2017-08-15'::date)
Rows Removed by Filter: 4565249
Planning time: 0.572 ms
Execution time: 14908.274 ms
Note: I learned yesterday about this command, so I am still not able to fully understand all that is returned.
I have tried using index-only scans
, as suggested here, by running the command: CREATE index idx_all_legs on all_legs(dep_dt);
but I did not notice any difference in running time. I also tried creating the index for all columns, as I want all columns return.
Another thought was sorting all rows by dep_dt
, so then the search of all rows fulfilling the condition should be much faster as they would not be scattered. Unfortunately, I don't know how to implement this.
Is there a way to make it as fast as I am aiming to?
Solution
As suggested in the Laurenz' answer, by adding an index CREATE INDEX IF NOT EXISTS idx_dep_dt_price ON all_legs(dep_dt, price_ct);
and adapting the condition in the SELECT
to WHERE dep_dt >= '2017-08-15 00:00:00' AND dep_dt < '2017-08-16 00:00:00'
has reduced the running time to 1/4. Even if it is a very good improvement, that means running times between 2 and 6 seconds.
Any additional idea to reduce the running time even further would be appreciated.