Optimized Postgresql like and group by clause

Question

Database: PostgresSQL PostgreSQL 12.11 (Ubuntu 12.11-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
RAM : 8 GB
Processor : i7 4510U (dual core 2 Ghz)

I would to like to optimized below query

select  a.gender from "employees" as a 
where  lower( gender ) LIKE 'f%'  group by gender
limit 20

Total number of records in table : 2,088,290 rows

Index

CREATE INDEX ix_employees_gender_lower ON public.employees USING btree (lower((gender)::text) varchar_pattern_ops)

query execution plan

https://explain.dalibo.com/plan/h7e

Please use gdrive link to download and restore the sql to database for above query SQL TABLE with data

I tried to index but unavail also i am not able to understand explain analyze so any pointers on the same as well

What’s the idea of using GROUP BY in this query? There is no aggregate — Frank Heikens, Jul 23 '22 at 02:04
@FrankHeikens i used grouping for getting unique values like distinct but distinct takes alot of time hence i have to opt for group by — Rizwan Patel, Jul 23 '22 at 15:49

score 1 · Accepted Answer · answered Jul 25 '22 at 00:55

It sounds like you need an index skip-scan. PostgreSQL currently doesn't implement those automatically but you can emulate it with a recursive CTE. People are working on adding this to the planner so it will be chosen automatically, but even if they succeed it would probably not work with your case-folding LIKE condition. I couldn't see how to integrate the case-folding LIKE condition into the recursive CTE, but if you return all distinct genders preserving case, you can then filter that small list quickly without needing to use an index.

WITH RECURSIVE t AS (
   SELECT min(gender) AS gender FROM employees
   UNION ALL
   SELECT (SELECT min(gender) FROM employees WHERE gender > t.gender)
   FROM t WHERE t.gender IS NOT NULL
   )
SELECT gender FROM t WHERE gender IS NOT NULL and lower(gender) like 'f%';

This took less than 2 ms for me, but it does require you add a plain index on gender, which you don't seem to have already.

this has addressed the problem statement and I am quite thankful for index skip-scan explanation which is quite eye opener. , However i would require more time to integrate this in query structure in modular form with multiple where clauses and run automated test runs to actually see how this work in across env. thanks again @jjanes ! :) — Rizwan Patel, Jul 25 '22 at 16:08

score 0 · Answer 2 · answered Jul 22 '22 at 20:31

0

so apparently

refactoring base query to below

select  gender from  (
select  a.gender  from "employees" as a 
where  lower(a.gender::text) LIKE 'f%'
limit 40) b 
group by b.gender

brought the execution time from 5 seconds to 16 ms

answered Jul 22 '22 at 20:31

Rizwan Patel

538
2
9
27

1

Well of course it would. It does quite a different thing, which is much easer to do. You could get the answer even faster yet by just doing `select 'f'`. – jjanes Jul 22 '22 at 22:18
@jjanes care to provide alternative solutions for above problem statement then – Rizwan Patel Jul 22 '22 at 22:59

score 0 · Answer 3 · answered Jul 23 '22 at 16:23

0

Even the bad plan is far better for me than it appears to be for you (4s with completely cold cache, 0.4s upon repeat execution), and my hardware is far from excellent.

If the time is going to random page reads, you could greatly reduce that by creating an index suited for index-only-scans and making sure the table is well vacuum.

CREATE INDEX ix_employees_gender_lower2 ON public.employees USING btree (lower((gender)::text) varchar_pattern_ops, gender)

That reduces the timing to 0.3s, regardless of cache warmth.

But I don't see the point of running this query even once, much less often enough to care if it takes 22s.

answered Jul 23 '22 at 16:23

jjanes

37,812
5
27
34

Hi @jjanes well my functional problem statement is to autopopulate so what kinds of query would you recommend . Ref https://www.youtube.com/watch?v=9i1B5L29qBI&t=40s – Rizwan Patel Jul 23 '22 at 18:09
@RizwanPatel Try the thing I just suggested, or figure out why your hardware is so bad, or use a materialized view. Or just forget about autopopulate, and let them type it in their self if they wish to make a nonstandard selection. – jjanes Jul 24 '22 at 16:45
i tried the updated index it shaves from 5 secs to 1.5 on my machine which is great thanks for that but again its a bummer that postgres cannot scale simple group by with like condition query for 2 mil rows and about autopopulate its is like functional necessity in terms of any reporting platform. anyways I keep this question open hoping there is efficient way to address for this problem statement , Cheers ! @jjanes – Rizwan Patel Jul 24 '22 at 16:58

Optimized Postgresql like and group by clause

3 Answers3