I come from Mysql and nosql database, and I am new to Postgress db. I am using Postgres 11.6 on Aurora AWS.
I am trying to create tables made of two columns, a key and a jsonb value.
Each value looks like this:
{"game": "game6",
"username": "Djobi",
(bunch of fields)
"permissions": ["permission3", "permission1", "permission5"]}
I am trying to add different indexes to see what are the capacities of the DB. One of them, is to find user by their name. The other is to find users having specific permission.
account_index | account_index_username | CREATE INDEX account_index_username ON public.account_index USING btree (((value -> 'username'::text)))
account_index | account_index_permissions | CREATE INDEX account_index_permissions ON public.account_index USING gin (((value -> 'permissions'::text)))
account_index | account_global_gin | CREATE INDEX accountgin ON public.account_index USING gin (value)
I have two tables, with the exact same data. Each tables have about 30 millions rows. One with the index, one without. I am runing the following queries to test the performances:
SELECT 1
FROM account_noindex
WHERE value @> '{"permissions": ["permission1"]}' limit 10;
I am trying here to find user having permission1. (side note, not sure yet how to ask how to have specifically permission1 versus any permission including permission1) When runing my request on my index tables, I get a response time of 5000ms. When runing my request on my non index tables, I get a response time of 50 ms .
So the non index table is 100 time faster than the index one, which I have to say confuse me. If I try to run a explain of the two queries I get the following results:
Index table:
Limit (cost=430.49..468.31 rows=10 width=32)
-> Bitmap Heap Scan on account_index (cost=430.49..144146.44 rows=37999 width=32)
Recheck Cond: (value @> '{"permissions": ["permission1"]}'::jsonb)
-> Bitmap Index Scan on accountgin (cost=0.00..420.99 rows=37999 width=0)
Index Cond: (value @> '{"permissions": ["permission1"]}'::jsonb)
(5 rows)
Non index table
Limit (cost=0.00..1935.23 rows=10 width=4)
-> Seq Scan on account_noindex (cost=0.00..7360637.42 rows=38035 width=4)
Filter: (value @> '{"permissions": ["permission1"]}'::jsonb)
If I try to go deeper in offset (above 100k) the difference is less obvious, but the non index table is still faster.
[edit] Here the full Analyze buffers for the indexed table:
EXPLAIN (ANALYZE, BUFFERS) SELECT value->'permissions' FROM account_index WHERE value @> '{"permissions": ["permission1"]}' limit 12;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=430.49..475.88 rows=12 width=32) (actual time=1926.289..1926.365 rows=12 loops=1)
Buffers: shared hit=24398
-> Bitmap Heap Scan on account_index (cost=430.49..144146.44 rows=37999 width=32) (actual time=1926.287..1926.361 rows=12 loops=1)
Recheck Cond: (value @> '{"permissions": ["permission1"]}'::jsonb)
Rows Removed by Index Recheck: 30
Heap Blocks: lossy=8
Buffers: shared hit=24398
-> Bitmap Index Scan on accountgin (cost=0.00..420.99 rows=37999 width=0) (actual time=1916.655..1916.656 rows=8386144 loops=1)
Index Cond: (value @> '{"permissions": ["permission1"]}'::jsonb)
Buffers: shared hit=24390
Planning Time: 0.073 ms
Execution Time: 1927.143 ms
So what am I missing here? Did I create my GIN index incorrectly? Or am I doing my request wrong?
Note that I have the same issue with the index on my username. When making a request to find user with a specific username without any LIMIT, I get a full table scan with or without the GIN index. When I add the BTREE index then there is no problem and no limit is needed.