Select distinct. Most suitable technique to reduce wait time

Question

I have lots of rows in my table in Postgres DB.

I make insertions in this table every 20 minutes, clear old entries every day and have only 2 selection options from it.

So I want to optimise time, that I wait for my selections.

First selection is kind of:

Select * from table where item=<item_id>

Second is kind of:

Select distinct(datetime) from table

So, in order to optimize 1 selection, I may make indexies for item field. As I understood this technique works perfectly for queries where smth equals smth.

But I don't know how to optimise my 2 selection query. I think smth like partitioning should help me, but there are several types of partitioning and I am a bit confused.

So what is the best option to optimize my queries?

Also, I am using python and Django models. If there is a good library that may do all the dirty job. That would be great. Most suitable now, that I found: http://architect.readthedocs.io/

Edit1 Thanks to Evan Carrol.

Tried to use index for second query. Command:

explain analyze select distinct time_updated from wow_auction_gordunni

Gives:

HashAggregate  (cost=335091.65..335092.51 rows=86 width=8) (actual time=4246.582..4246.607 rows=91 loops=1)
  Group Key: time_updated
  ->  Seq Scan on wow_auction_gordunni  (cost=0.00..313574.92 rows=8606692 width=8) (actual time=0.047..2257.979 rows=8616562 loops=1)
Planning time: 0.080 ms
Execution time: 4246.675 ms

Then creating index and vacuum:

Create INDEX ON wow_auction_gordunni (time_updated);
VACUUM ANALYZE wow_auction_gordunni;
explain analyze select distinct time_updated from wow_auction_gordunni;

Gives following:

Unique  (cost=0.43..249907.42 rows=92 width=8) (actual time=0.057..3537.626 rows=92 loops=1)
  ->  Index Only Scan using wow_auction_gordunni_time_updated_idx on wow_auction_gordunni  (cost=0.43..228163.42 rows=8697599 width=8) (actual time=0.055..2488.408 rows=8696562 loops=1)
        Heap Fetches: 85796
Planning time: 0.726 ms
Execution time: 3537.800 ms

So it seems that index helps a bit (postgres started to use index), but not dramatically.

Evan Carroll · Accepted Answer · 2017-03-03T17:21:43.817

It will use an index scan so long as it makes sense. Sample data,

CREATE TABLE foo
AS
  SELECT
    x%3 AS x,
    repeat( md5(x::text)::text, 200 ) AS t1
  FROM generate_series(1,1e6) AS t(x);

CREATE INDEX ON foo (x);
VACUUM ANALYZE foo;

Query,

EXPLAIN ANALYZE SELECT DISTINCT x FROM Foo;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.42..28480.42 rows=200 width=32) (actual time=0.034..257.734 rows=3 loops=1)
   ->  Index Only Scan using foo_x_idx on foo  (cost=0.42..25980.42 rows=1000000 width=32) (actual time=0.031..122.668 rows=1000000 loops=1)
         Heap Fetches: 0
 Planning time: 0.090 ms
 Execution time: 257.764 ms
(5 rows)

So to optimize your second selection query, create an index on datetime. Check out the EXPLAIN ANALYZE. See if it gets use an an index. If that doesn't help or the index isn't being used you can try to set enable_seqscan = off and then rerunning the query. Now you know what the savings would be, if any. You can paste both plans here and we can take a look at it.

This will only work if the table has been vacuumed recently and nothing much has changed, IOW if PostgreSQL *knows* that almost all tuples in the index are visible to everyone. — Laurenz Albe, Mar 04 '17 at 04:09
Thanks for commands. Don't know about `vacuum analyze` and what means data after this `explain analyze` command. Will dig in them and try this day. — Snobby, Mar 04 '17 at 09:28
Tried to make an index. Updated question. It helps, but not very much. — Snobby, Mar 05 '17 at 09:59

score 0 · Answer 2 · answered Mar 03 '17 at 15:42

0

There is no way to optimize the second query, it has to scan the whole table to find all possible values of datetime.

The best you can do is to see that the table is not bloated by clearing the table with TRUNCATE rather than DELETE, and to put enough RAM into the machine so that the whole table is in RAM.

answered Mar 03 '17 at 15:42

Laurenz Albe

209,280
17
206
263

It doesn't have to scan the whole table. It may very well do that, but it doesn't have too. – Evan Carroll Mar 03 '17 at 17:05
True. But then it has to scan the whole index. But you're right, if the index is small, that will make it somewhat faster. – Laurenz Albe Mar 03 '17 at 21:20

Select distinct. Most suitable technique to reduce wait time

2 Answers2