0

In Postgresql, I have a table with a unique constraint on two foreign keys like below:

CREATE TABLE project_ownerships(
  id BIGSERIAL  PRIMARY KEY,
  project_id BIGINT REFERENCES projects ON DELETE CASCADE,
  user_id BIGINT REFERENCES users ON DELETE CASCADE,
  role SMALLINT,
  CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);

Having set a unique constraint on the two foreign keys project_id and user_id, does psql automatically create an index for each of them too? Or am I supposed to manually create an index for them still? Like below:

CREATE TABLE project_ownerships(
  id BIGSERIAL  PRIMARY KEY,
  project_id BIGINT REFERENCES projects ON DELETE CASCADE,
  user_id BIGINT REFERENCES users ON DELETE CASCADE,
  role SMALLINT,
  CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);

CREATE INDEX po_project_id_idx ON project_ownerships (project_id);
CREATE INDEX po_user_id_idx ON project_ownerships (user_id);

I have read the text here, just wanted to make sure I understood it correctly in terms of actual implementation details. Specifically, with a composite index created-CONSTRAINT project_user_unique UNIQUE (project_id, user_id)- will Postgresql be able to execute an index scan(if deemed necessary) when I'm doing a join on project_id, user_id or both? Do I still need to separately create an index for each of the foreign keys?

kyw
  • 6,685
  • 8
  • 47
  • 59
  • If you "just wanted to make sure I understood it correctly", what is your understanding & where are you stuck & why? – philipxy Feb 17 '20 at 21:15
  • @philipxy My understanding was creating a unique constraint of multiple columns will create a 'composite' index of both of them. But I wasn't sure if the composite index would actually create a index for each of the column such that doing a `join` on any of them would utilize their index when deemed appropriate, hence this question.. – kyw Feb 18 '20 at 00:46
  • Please clarify via edits, not comment. If your "question" doesn't give your expectations, decisions & justification, then they can't be addressed & this is just asking for yet another generic presentation of the documentation & a bespoke tutorial, which is too broad & probably nevetheless a duplicate. Also if you don't know how to check for what constraints & indexes you have, that's also in the manual & a duplicate question. See [ask], other [help] links & the voting arrow mouseover texts. – philipxy Feb 18 '20 at 01:32

2 Answers2

3

A unique constraint on (project_id, user_id) creates an index, equivalent to:

create unique index unq_project_ownerships_project_user
    on project_ownerships(project_id, user_id);

With this index, an index on project_ownerships(project_id) would be redundant. Wherever this index could be used, the unique index could be used.

However an index on project_ownerships(user_id) could still be useful, depending on your queries. It could be used in situations where the unique index would not be used.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • However, keep in mind every index adds cost to your inserts, updates, and deletes. Add them when necessary only. – The Impaler Feb 16 '20 at 04:08
  • @TheImpaler I will be using `joins` on those two fk, so i guess it will outweigh the costs..? Gordon, so I just need to manually create an index just for the second(and subsequent if any) column in a unique constraint? – kyw Feb 16 '20 at 04:18
  • `There's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.`[link](https://www.postgresql.org/docs/current/indexes-unique.html) Sounds like index for `user_id` would be automatically created too if it's one of the columns in a multicolumn unique constraint? – kyw Feb 16 '20 at 08:31
  • @kyw . . . I think that is referring to the entire composite index, not to the individual columns. – Gordon Linoff Feb 16 '20 at 16:44
2

It is a matter of taste, but personally I don't think you need the surrogate key (id) here. (is it ever used?) Also: role is a (non reserved) keyword. Avoid using it as identifier.

For the foreign keys, an index is absolutely necessary, otherwise a CASCADEing delete or update would (internally) lead to a sequential scan for every deleted/updated user- or project- tuple.

For a junction (bridge) table like this, it is sufficient to create an index (or UNIQUE constraint) with the key elements in the reversed order. This also serves as a supporting index for the FK. [the first element(s) of a composite index can be used as if an index with only these fields existed]

The extra key field in the index can enable index-only scans (for instance: when the the_role field is not needed)


CREATE TABLE project_ownerships
  ( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
  , user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
  , the_role INTEGER
  , PRIMARY KEY  (project_id, user_id)
  , CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
  );

A small test-setup (I need to disable sort and hashjoin, because for small tables like this these actually lead to cheaper plans ;-)


SET search_path=tmp;
SELECT version();

CREATE TABLE projects
        ( id bigserial not NULL PRIMARY KEY
        , the_name text UNIQUE
        );

CREATE TABLE users
        ( id bigserial not NULL PRIMARY KEY
        , the_name text UNIQUE
        );

CREATE TABLE project_ownerships
  ( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
  , user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
  , the_role INTEGER
  , PRIMARY KEY  (project_id, user_id)
  , CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
  );

INSERT INTO projects( the_name)
SELECT 'project-' || gs::text
FROM generate_series(1,1000) gs
        ;

INSERT INTO users( the_name)
SELECT 'name_' || gs::text
FROM generate_series(1,1000) gs
        ;

INSERT INTO project_ownerships (project_id,user_id,the_role)
SELECT p.id, u.id , (random()* 100)::integer
FROM projects p
JOIN users u ON random() < .10
        ;

VACUUM ANALYZE projects,users,project_ownerships;


SET enable_hashjoin = 0;
SET enable_sort = 0;
-- SET enable_seqscan = 0;

EXPLAIN ANALYZE
SELECT p.the_name AS project_name
        , po.the_role AS the_role
FROM projects p
JOIN project_ownerships po ON po.project_id = p.id
        AND EXISTS (
        SELECT *
        FROM users u
        WHERE u.id = po.user_id
        AND u.the_name >= 'name_10'
        AND u.the_name < 'name_20'
        );



EXPLAIN ANALYZE
SELECT u.the_name AS user_name
        , po.the_role AS the_role
FROM users u
JOIN project_ownerships po ON po.user_id = u.id
        AND EXISTS (
        SELECT *
        FROM projects p
        WHERE p.id = po.project_id
        AND p.the_name >= 'project-10'
        AND p.the_name < 'project-20'
        );

Resulting query plans:


SET
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
SET
SET
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.97..4693.68 rows=11924 width=15) (actual time=0.333..153.660 rows=11157 loops=1)
   ->  Nested Loop  (cost=0.69..1204.55 rows=11924 width=12) (actual time=0.268..53.192 rows=11157 loops=1)
         ->  Index Scan using users_the_name_key on users u  (cost=0.28..7.02 rows=119 width=8) (actual time=0.126..0.317 rows=112 loops=1)
               Index Cond: ((the_name >= 'name_10'::text) AND (the_name < 'name_20'::text))
         ->  Index Scan using reversed_pk on project_ownerships po  (cost=0.42..9.06 rows=100 width=20) (actual time=0.015..0.308 rows=100 loops=112)
               Index Cond: (user_id = u.id)
   ->  Index Scan using projects_pkey on projects p  (cost=0.28..0.29 rows=1 width=19) (actual time=0.005..0.005 rows=1 loops=11157)
         Index Cond: (id = po.project_id)
 Planning Time: 6.218 ms
 Execution Time: 162.319 ms
(10 rows)

                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.97..4057.79 rows=11022 width=12) (actual time=0.084..93.584 rows=11236 loops=1)
   ->  Nested Loop  (cost=0.69..832.59 rows=11022 width=12) (actual time=0.063..25.260 rows=11236 loops=1)
         ->  Index Scan using projects_the_name_key on projects p  (cost=0.28..6.84 rows=110 width=8) (actual time=0.037..0.163 rows=112 loops=1)
               Index Cond: ((the_name >= 'project-10'::text) AND (the_name < 'project-20'::text))
         ->  Index Scan using project_ownerships_pkey on project_ownerships po  (cost=0.42..6.51 rows=100 width=20) (actual time=0.010..0.111 rows=100 loops=112)
               Index Cond: (project_id = p.id)
   ->  Index Scan using users_pkey on users u  (cost=0.28..0.29 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=11236)
         Index Cond: (id = po.user_id)
 Planning Time: 0.971 ms
 Execution Time: 99.671 ms
(10 rows)
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • Thanks for the note on `id` and `role`! I wonder though if I do both the `PRIMARY KEY` and `UNIQUE constraint`, will it create some duplicate indexes? – kyw Feb 16 '20 at 12:31
  • Yes, duplicate indexes. That won't harm. Instead it will support index-only scans from both single-key *directions*. It will have more value than the surrogate and its index. – wildplasser Feb 16 '20 at 12:47
  • Thank you for expanding your answer. If I may follow up with another question: why the need of reversing the order at all? And I don't get this part `the first element(s) of a composite index can be used as if an index with only these fields existed`.. – kyw Feb 17 '20 at 01:38