0

Using PostgreSQL 9.6

I have a table with some values I want to filter on and order by time:

  • a timestamp (may be range selected in the UI)
  • status string (only a few known values for now, also selectable in the UI)
  • context (scope of the data in the UI)

I wonder if I should have:

  1. A btree index on (context, status) + separate index on time
  2. OR A btree index on (context, status, time)
  3. OR A btree index on each ?
  4. OR A btree index on (time, status, context), for small time ranges ?

I suspect number 1 is the best option, context + status will allow to filter out values and then it will scan the time index. I created number 1 concurrently on my data and saw some improvements but how do you decide between each approach, are there some guidelines ?

One of the queries looks more or less like:

select * from event
where severity = 'WARNING' and 
fk_context = 1359544
order by timestamp LIMIT 30; // Other one has timestamp > ...

Another one is looking for a timerange. I looks like postgres uses multiple indexes, one with (fk_context, severity, timestamp) and then uses (severity, time) index, but it also depends on the limit.

Christophe Roussy
  • 16,299
  • 4
  • 85
  • 85
  • 2
    Up to 32 columns can be specified. This is a hard limit. Your puprose should define your needs. Check queries you want to run against and chose the right index. I don't think there possibly could be generic guide for your 1,2,3,4 questons – Vao Tsun Mar 15 '18 at 10:10
  • Yes I presume it highly depends on the needs of the user interface. I will probably end up with more than one index for the various cases if performance is still an issue. – Christophe Roussy Mar 15 '18 at 10:12
  • It would help if you provided an example of the `where` clause. – Gordon Linoff Mar 15 '18 at 10:24
  • Sorry my question is not very specific ... I wanted to know how to approach this in general. It seems you must tailor indexes for each possible case and postgres will figure out what is best to use. – Christophe Roussy Mar 15 '18 at 10:38

1 Answers1

2

Your question is unclear. If you have three potential conditions:

where timestamp between @a and @b order by time
where status = @s order by time
where context = @c order by time

Then you want three indexes: (timestamp, time), (status, time), and (context, time).

If the conditions are:

where timestamp between @a and @b and
      status = @s and
      context = @c
order by time 

Then you want one index, (status, context, timestamp, time).

And there are other possibilities consistent with your description.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • ok I see so I probably have to build a lot of indexes for the different queries. – Christophe Roussy Mar 15 '18 at 10:32
  • 1
    I don't think the last example is correct; the `timestamp` column should not be the first. It might be good to mention that a multi-column index can be used with a condition that uses the leading index columns. – Laurenz Albe Mar 15 '18 at 11:47
  • @LaurenzAlbe . . . Good catch. I made up the examples, so I really should have had the indexes aligned with the `where` clauses. – Gordon Linoff Mar 15 '18 at 12:21
  • Actually, I think there is another problem with the second example: The `time` column is useless, because it cannot be used for sorting. If the index scans through the `timestamp` entries for a given `status` and `context`, it will get results that are not necessarily orderd by `time`, so a sort cannot be avoided. – Laurenz Albe Mar 15 '18 at 14:29