Row Level Security Query Plan is 45x slower than identical non-RLS query

Question

I'm having some trouble getting the query planner to write good plans for row level security (RLS) enabled tables. It seems all it takes is a join from a row level security enabled table to a non-row level security enabled table to force a bad plan even if there are appropriate indexes on both tables that the planner should be able to use.

Is there a way to help the planner figure this out? Or are certain statistics unavailable when RLS is involved?

I tried enabling RLS (adding a wide open policy with USING (TRUE)) for the table that doesn't need RLS and that has the same effect as not including a policy on that table.

DROP SCHEMA IF EXISTS foo CASCADE;
CREATE SCHEMA foo;

CREATE TABLE foo.bar AS
SELECT generate_series(1,10000000) AS id, md5(random()::text) AS descr, random() * 5 + 1 AS licflag;

CREATE TABLE foo.baz AS
SELECT generate_series(1,10000000) AS id, md5(random()::text) AS descr, random() * 5 + 1 AS licflag;

CREATE UNIQUE INDEX ON foo.bar (id);
CREATE INDEX ON foo.bar (licflag);

CREATE UNIQUE INDEX ON foo.baz (id);
CREATE INDEX ON foo.baz (licflag);

ANALYZE foo.bar;
ANALYZE foo.baz;

ALTER TABLE foo.bar ENABLE ROW LEVEL SECURITY;
--ALTER TABLE foo.baz ENABLE ROW LEVEL SECURITY;

DROP ROLE IF EXISTS restricted;
CREATE ROLE restricted NOINHERIT;

REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA foo FROM restricted;
GRANT restricted to current_user;
GRANT USAGE ON SCHEMA foo TO restricted;
GRANT SELECT ON ALL TABLES IN SCHEMA foo TO restricted;

CREATE POLICY restrict_foo ON foo.bar 
FOR SELECT TO restricted
USING (licflag < 3);

/*
CREATE POLICY restrict_foo ON foo.baz 
FOR SELECT TO restricted
USING (TRUE);
*/

EXPLAIN ANALYZE
SELECT * 
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id 
WHERE f2.id BETWEEN 500 AND 12000
AND f1.licflag < 3;

SET ROLE restricted;

EXPLAIN ANALYZE
SELECT * 
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id 
WHERE f2.id BETWEEN 500 AND 12000;

Results in

                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.87..87677.34 rows=4668 width=90) (actual time=0.091..45.337 rows=4622 loops=1)
   ->  Index Scan using baz_id_idx on baz f2  (cost=0.43..471.90 rows=11573 width=45) (actual time=0.042..4.496 rows=11501 loops=1)
         Index Cond: ((id >= 500) AND (id <= 12000))
   ->  Index Scan using bar_id_idx on bar f1  (cost=0.43..7.53 rows=1 width=45) (actual time=0.003..0.003 rows=0 loops=11501)
         Index Cond: (id = f2.id)
         Filter: (licflag < '3'::double precision)
         Rows Removed by Filter: 1
 Planning time: 1.300 ms
 Execution time: 45.826 ms
(9 rows)

SET
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=569.62..273628.35 rows=4227 width=90) (actual time=8.317..2074.996 rows=4558 loops=1)
   Hash Cond: (f1.id = f2.id)
   ->  Seq Scan on bar f1  (cost=0.00..218457.95 rows=3967891 width=45) (actual time=0.016..1616.577 rows=3998388 loops=1)
         Filter: (licflag < '3'::double precision)
         Rows Removed by Filter: 6001612
   ->  Hash  (cost=436.47..436.47 rows=10652 width=45) (actual time=8.033..8.033 rows=11501 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 1027kB
         ->  Index Scan using baz_id_idx on baz f2  (cost=0.43..436.47 rows=10652 width=45) (actual time=0.026..4.871 rows=11501 loops=1)
               Index Cond: ((id >= 500) AND (id <= 12000))
 Planning time: 0.305 ms
 Execution time: 2075.371 ms
(11 rows)

I'm on

psql (9.5.3, server 9.5.4)

UPDATE 1:

I ran the query with the RLS predicate in the WHERE clauses like

EXPLAIN ANALYZE
SELECT *
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id
WHERE f1.id BETWEEN 500 AND 12000
AND f1.licflag < 3;

and it resulted in a better query plan. But when I removed this extra predicate, the planner KEPT the better plan. This leads me to think there is something wrong with the stats.. anyone know how to manually trigger a stats update without resetting the stats for the entire db? Going through the Postgres docs myself now...

Update 2:

Tried setting the global and table level statistics limits to no avail. Was able to get the proper query plan with a subselect instead of a join with a similar query so may be able to use that technique as a workaround.

believe this is similar to https://www.postgresql.org/message-id/CADBa3wZpuYXwdry2g68NxYmAXOPvx0DLvdAU4kdo2GoPtxKu+Q@mail.gmail.com — Justin, Sep 22 '16 at 18:44
"*Does anyone know how to manually trigger a stats update*" - analyze: https://www.postgresql.org/docs/current/static/sql-analyze.html — , Sep 22 '16 at 20:01
Playing with the statistic targets didn't seem to have any effect, unfortunately — Justin, Sep 26 '16 at 19:38
You should upgrade to the latest Postgres version (which, as of time of writing this, is 10.3). Postgres 10 includes some significant improvements regarding row-level security performance, for example: https://github.com/postgres/postgres/commit/215b43cdc8d6b4a1700886a39df1ee735cb0274d — mkurz, May 07 '18 at 09:02

Row Level Security Query Plan is 45x slower than identical non-RLS query

0 Answers0