3

I have a really simple query I'm running on a large table (500k rows) to page results.

Originally I was using this query, which is really fast:

 select * from deck 
    order by
        deck.sas_rating desc    
    limit 10

Its explain analyze show a 0.2ms execution time. Cool.

But the sas_rating column has duplicate integer values, and I realized when paging through the results (using offset for other pages) that I was getting duplicate results. No problem, add the primary key as a secondary order by. But the performance is terrible.

 select * from deck 
    order by
        deck.sas_rating desc,
        deck.id asc     
    limit 10

That takes 685ms with an explain analyze of:

Limit  (cost=164593.15..164593.17 rows=10 width=1496) (actual time=685.138..685.139 rows=10 loops=1)
  ->  Sort  (cost=164593.15..165866.51 rows=509343 width=1496) (actual time=685.137..685.137 rows=10 loops=1)
        Sort Key: sas_rating DESC, id
        Sort Method: top-N heapsort  Memory: 59kB
        ->  Seq Scan on deck  (cost=0.00..153586.43 rows=509343 width=1496) (actual time=0.009..593.444 rows=509355 loops=1)
Planning time: 0.143 ms
Execution time: 685.171 ms

It's even worse on my weaker production server. My search went from 125ms total to 35 seconds!

I tried adding a multi-column index, but that didn't improve performance. Is there any way to prevent duplicate results when using limit + offset without destroying the performance of the query?

CorayThan
  • 17,174
  • 28
  • 113
  • 161
  • If you just want to get rid of duplicates on this column, it seems there is a `distinct on` in postgres: https://dba.stackexchange.com/a/24328 (near the end). –  Jan 19 '19 at 06:50
  • @dyukha There aren't duplicate rows. It's that when I select with `limit 10`, and then do another select with `limit 10 offset 10` I can retrieve some of the same results because `sas_rating` contains non-unique values. See [this SO question](https://stackoverflow.com/q/13580826/1313268) for the problem, but no solution that works for me. – CorayThan Jan 19 '19 at 06:52
  • I see. What if you try to order by `ROW_NUMBER() OVER(ORDER BY id)`? Maybe something like here: https://zaiste.net/row_number_in_postgresql/ , but with `order by` instead of `where`. Sorry, I'm not sure it even works and have nowhere to test it. –  Jan 19 '19 at 07:01
  • Another option for pagination is described here: https://use-the-index-luke.com/no-offset –  Jan 19 '19 at 09:24

1 Answers1

5

I believe you probably added your multi-column index wrong - for example you used sas_rating, id instead of sas_rating desc, id, as only the latter can be used for your order by.

The index on sas_rating only was usable for sas_rating desc sorting, as the database could go backwards over it. It is not the case for multi-column index - you have to preserve the ascending/descending order of columns as in the sort.

Tometzky
  • 22,573
  • 5
  • 59
  • 73
  • You're right. I just needed my multi-column index to use the proper `desc` order for `sas_rating`. I didn't even realize indexes had `desc` vs `asc`! – CorayThan Jan 19 '19 at 09:33
  • @Tometzky . . . Wow! Great observation. I think Postgres should be smart enough to use `(sas_rating, id)` for this purpose. But that would require implementing skip scans. – Gordon Linoff Jan 19 '19 at 11:20