10

Assuming a schema like the following:

CREATE TABLE node (
  id       SERIAL PRIMARY KEY,
  name     VARCHAR,
  parentid INT REFERENCES node(id)
);

Further, let's assume the following data is present:

INSERT INTO node (name,parentid) VALUES
('A',NULL),
('B',1),
('C',1);

Is there a way to prevent cycles from being created? Example:

UPDATE node SET parentid = 2 WHERE id = 1;

This would create a cycle of 1->2->1->...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
sschober
  • 2,003
  • 3
  • 24
  • 38

5 Answers5

20

Your trigger simplified and optimized, should be considerably faster:

CREATE OR REPLACE FUNCTION detect_cycle()
  RETURNS TRIGGER
  LANGUAGE plpgsql AS
$func$
BEGIN
   IF EXISTS (
      WITH RECURSIVE search_graph(parentid, path, cycle) AS ( -- relevant columns
          -- check ahead, makes 1 step less
         SELECT g.parentid, ARRAY[g.id, g.parentid], (g.id = g.parentid)
         FROM   node g
         WHERE  g.id = NEW.id  -- only test starting from new row
         
         UNION ALL
         SELECT g.parentid, sg.path || g.parentid, g.parentid = ANY(sg.path)
         FROM   search_graph sg
         JOIN   node g ON g.id = sg.parentid
         WHERE  NOT sg.cycle
         )
      SELECT FROM search_graph
      WHERE  cycle
      LIMIT  1  -- stop evaluation at first find
      )
   THEN
      RAISE EXCEPTION 'Loop detected!';
   ELSE
     RETURN NEW;
   END IF;
END
$func$;

You don't need dynamic SQL, you don't need to count, you don't need all the columns and you don't need to test the whole table for every single row.

CREATE TRIGGER detect_cycle_after_update
AFTER INSERT OR UPDATE ON node
FOR EACH ROW EXECUTE PROCEDURE detect_cycle();

An INSERT like this has to be prohibited, too:

INSERT INTO node (id, name,parentid) VALUES (8,'D',9), (9,'E',8);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    This is great! How would one go about making it "generic"? That is, passing in the fields that represent id and parent_id? Then the same function could be used for any self-referential table. – Bruce Pierson Jan 13 '16 at 18:54
  • I am wondering HOW a single INSERT can create a cycle? Because there is no possibility that your uncreated node has a child. Leaves can not create cycles. @Erwin Brandstetter – Ismail Yavuz Jun 23 '20 at 07:24
  • 1
    @İsmailYavuz: True, since referential integrity is enforced with an FK constraint, a single-row insert can never create a cycle. But multi-row inserts can. (Like the example at the end of my post.) FK constraints are checked after every *command* by default. And the suggested trigger catches that. – Erwin Brandstetter Jun 24 '20 at 00:20
  • @ErwinBrandstetter I was trying to use this approach (with extra check) to avoid cycles in my self-referential foreign key. The check is, there is way to disable the parent linkage using a field, let's say, `linkage_overridden`. If that is `true`, then foreign constraint won't be used, and even though `parent_id == id` check is true, there won't be a cycle. How would that kind of check work here? I was trying to update `g.parentid = ANY(sg.path)` condition with: `g.parentid = ANY(sg.path) and g.linkage_overridden = false`, but it's not working. – Rohit Jain Jan 05 '21 at 13:09
  • @RohitJain: Please present your case with all relevant information as new question. You can always link to this one for context. – Erwin Brandstetter Jan 05 '21 at 22:02
3

To answer my own question, I came up with a trigger that prevents this:

CREATE OR REPLACE FUNCTION detect_cycle() RETURNS TRIGGER AS
$func$
DECLARE
  loops INTEGER;
BEGIN
   EXECUTE 'WITH RECURSIVE search_graph(id, parentid, name, depth, path, cycle) AS (
        SELECT g.id, g.parentid, g.name, 1,
          ARRAY[g.id],
          false
        FROM node g
      UNION ALL
        SELECT g.id, g.parentid, g.name, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM node g, search_graph sg
        WHERE g.id = sg.parentid AND NOT cycle
)
SELECT count(*) FROM search_graph where cycle = TRUE' INTO loops;
IF loops > 0 THEN
  RAISE EXCEPTION 'Loop detected!';
ELSE
  RETURN NEW;
END IF;
END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER detect_cycle_after_update
AFTER UPDATE ON node
FOR EACH ROW EXECUTE PROCEDURE detect_cycle();

So, if you try to create a loop, like in the question:

UPDATE node SET parentid = 2 WHERE id = 1;

You get an EXCEPTION:

ERROR:  Loop detected!
sschober
  • 2,003
  • 3
  • 24
  • 38
0
CREATE OR REPLACE FUNCTION detect_cycle()
  RETURNS TRIGGER AS
$func$
DECLARE
  cycle int[];
BEGIN
EXECUTE format('WITH RECURSIVE search_graph(%4$I, path, cycle) AS (
  SELECT g.%4$I, ARRAY[g.%3$I, g.%4$I], (g.%3$I = g.%4$I)
    FROM %1$I.%2$I g
   WHERE g.%3$I = $1.%3$I
  UNION ALL
  SELECT g.%4$I, sg.path || g.%4$I, g.%4$I = ANY(sg.path)
    FROM search_graph  sg
    JOIN %1$I.%2$I g ON g.%3$I = sg.%4$I
   WHERE NOT sg.cycle)
SELECT path
  FROM search_graph
 WHERE cycle
 LIMIT 1', TG_TABLE_SCHEMA, TG_TABLE_NAME, quote_ident(TG_ARGV[0]), quote_ident(TG_ARGV[1]))
INTO cycle
USING NEW;
IF cycle IS NULL
THEN
  RETURN NEW;
ELSE
   RAISE EXCEPTION 'Loop in %.% detected: %', TG_TABLE_SCHEMA, TG_TABLE_NAME, array_to_string(cycle, ' -> ');
END IF;

END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER detect_cycle_after_update
 AFTER INSERT OR UPDATE ON node
   FOR EACH ROW EXECUTE PROCEDURE detect_cycle('id', 'parent_id');
dry
  • 1
0

While the current accepted answer by @Erwin Brandstetter is ok when you process one update/insert at a time, it still can fail when considering concurrent execution.

Assume the table content defined by

INSERT INTO node VALUES
(1, 'A', NULL),
(2, 'B', 1),
(3, 'C', NULL),
(4, 'D', 3);

and then in one transaction, execute

-- transaction A
UPDATE node SET parentid = 2 where id = 3;

and in another

-- transaction B
UPDATE node SET parentid = 4 where id = 1;

Both UPDATE commands will succeed, and you can afterwards commit both transactions.

-- transaction A
COMMIT;
-- transaction B
COMMIT;

You will then have a cycle 1->4->3->2->1 in the table. To make it work, you will either have to use isolation level SERIALIZABLE or use explicit locking in the trigger.

Leisetreter
  • 133
  • 1
  • 9
0

slightly different from Erwin's

CREATE OR REPLACE FUNCTION detect_cycle ()
    RETURNS TRIGGER
    LANGUAGE plpgsql
    AS $func$
BEGIN
    IF EXISTS ( WITH RECURSIVE search_graph (
            id,
            name,
            parentid,
            is_cycle,
            path
) AS (
            SELECT *, FALSE,ARRAY[ROW (n.id,n.parentid)]                    
            FROM
                node n
            WHERE
                n.id = NEW.id
            UNION ALL
            SELECT
                n.*,
                ROW (n.id,n.parentid) = ANY (path),
                path || ROW (n.id,n.parentid)
                    
            FROM
                node n,
                search_graph sg
            WHERE
                n.id = sg.parentid
                AND NOT is_cycle
)
            SELECT * 
            FROM
                search_graph
            WHERE
                is_cycle
            LIMIT 1) THEN
    RAISE EXCEPTION 'Loop detected!';
ELSE
    RETURN new;
END IF;
END
$func$;
jian
  • 4,119
  • 1
  • 17
  • 32