7

The premise

In documentation, Row Level Security seems great. Based on what I've read I can now stop creating views like this:

SELECT data.*
FROM data
JOIN user_data
ON data.id = user_data.data_id
AND user_data.role = CURRENT_ROLE

The great part is, Postgres has a great analysis for that view starting with an index scan then a hash join on the user_data table, exactly what we want to happen because it's crazy fast. Compare that with a my RLS implementation:

CREATE POLICY data_owner
ON data
FOR ALL
TO user
USING (
  (
    SELECT TRUE AS BOOL FROM (
      SELECT data_id FROM user_data WHERE user_role = CURRENT_USER
    ) AS user_data WHERE user_data.data_id = data.id
  ) = true
)
WITH CHECK (TRUE);

This bummer of a policy executes the condition for each row in the data table, instead of optimizing by scoping the query to the rows which our CURRENT_USER has access to, like our view does. To be clear, that means select * from data hits every row in the data table.

The question

How do I write a policy with an inner select which doesn't test said select on every row in the target table. Said another way: how do I get RLS to run my policy on the target table before running the actual query on the result?

p.s. I've left this question someone vague and fiddle-less, mostly because sqlfiddle hasn't hit 9.5 yet. Let me know if I need to add more color or some gists to get my question across.

Community
  • 1
  • 1
cazzer
  • 1,726
  • 2
  • 18
  • 29
  • 2
    I stopped using SQLFiddle because it essentially never works for me. For Postgres I use: http://rextester.com/l/postgresql_online_compiler –  Mar 03 '17 at 07:29

1 Answers1

6

PostgreSQL may be able to generate a better plan if you phrase the policy like this:

...
USING (EXISTS
          (SELECT data_id
           FROM user_data
           WHERE user_data.data_id = data.id
             AND role = current_user
          )
      )

You should have a (PRIMARY KEY?) index ON user_data (role, data_id) to speed up nested loop joins.

But I think that it would be a better design to include the permission information in the data table itself, perhaps using the name[] type:

CREATE TABLE data(
   id integer PRIMARY KEY,
   val text,
   acl name[] NOT NULL
);

INSERT INTO data VALUES (1, 'one',   ARRAY[name 'laurenz', name 'advpg']);
INSERT INTO data VALUES (2, 'two',   ARRAY[name 'advpg']);
INSERT INTO data VALUES (3, 'three', ARRAY[name 'laurenz']);

Then you can use a policy like this:

CREATE POLICY data_owner ON data FOR ALL TO PUBLIC
   USING (acl @> ARRAY[current_user::name])
   WITH CHECK (TRUE);
ALTER TABLE data ENABLE ROW LEVEL SECURITY;
ALTER TABLE data FORCE ROW LEVEL SECURITY;

When I SELECT, I get only the rows for which I have permission:

SELECT id, val FROM data;
 id |  val
----+-------
  1 | one
  3 | three
(2 rows)

You can define a GIN index to support that condition:

CREATE INDEX ON data USING gin (acl _name_ops);
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Is it appropriate to use `name[]` here? The Postgres docs indicate that designers should not use `name[]`, as it is only for system catalogues. From the docs: "There are two other fixed-length character types in PostgreSQL, shown in Table 8-5. The name type exists only for the storage of identifiers in the internal system catalogs and is not intended for use by the general user." Link here: https://www.postgresql.org/docs/9.0/static/datatype-character.html – Ben Jul 13 '17 at 04:02
  • `name` should be fine, as only PostgreSQL identifiers are stored. But it is just as well to use `text`. – Laurenz Albe Jul 13 '17 at 04:07