I have a Postgresql table with something like 200k tuples, so not that much. What I try to do is filter out some rows and then order them using full-text matching:
SELECT * FROM descriptions as d
WHERE d.category_id = ?
AND d.description != ''
AND regexp_replace(d.description, '(...)', '') !~* '...'
AND regexp_replace(d.description, '...', '') !~* '...'
AND d.id != ?
ORDER BY ts_rank_cd(to_tsvector('english', name), plainto_tsquery('english', 'my search words')) DESC LIMIT 5 OFFSET 0';
There is a GIN index on description field.
Now this query works well only when there is less then 4000 or so records in the category. When its more like 5k or 6k then the query gets extremely slow.
I was trying different variations of this query. What I noticed is when I remove either WHERE clause or ORDER BY clause then I get big speed up. (Of course then I get irrelevant results)
What can I do to speedup this combination? Any way of optimizing or should I look for a solution outside Postgresql?
Additional question:
I'm experimenting further and for example this is the simplest query that I think runs too slow. Can I tell from explain analyze when it uses gist index and when doesn't?
SELECT d.*, d.description <-> 'banana' as dist FROM descriptions as d ORDER BY dist DESC LIMIT 5
"Limit (cost=16046.88..16046.89 rows=5 width=2425) (actual time=998.811..998.813 rows=5 loops=1)"
" -> Sort (cost=16046.88..16561.90 rows=206010 width=2425) (actual time=998.810..998.810 rows=5 loops=1)"
" Sort Key: (((description)::text <-> 'banana'::text))"
" Sort Method: top-N heapsort Memory: 27kB"
" -> Seq Scan on products d (cost=0.00..12625.12 rows=206010 width=2425) (actual time=0.033..901.260 rows=206010 loops=1)"
"Total runtime: 998.866 ms"`
Answered (kgrittn): DESC keyword is not correct for KNN-GiST and it's actually not wanted here. Removing it fixes the problem and gives right results.