2

In one session, I trying to lock multiple rows in "users" table, and get "status" column for user WHERE id = 2.

do $$
declare user_status int;
begin
    WITH t(id, status) AS(
        SELECT id, status FROM users WHERE id in( 2,4,7,6) order by id FOR UPDATE
    )
    SELECT status FROM t WHERE id = 2 INTO user_status;

    -- just run transaction for a while
    FOR i in 1..2000000000 loop
    end loop;

end;
$$ language plpgsql

while this transaction is running, I run in another session this query:

UPDATE users SET some_col = some_col WHERE id = 6;

I expected that row WHERE id = 6 should locked by first transaction, but this is not true, because UPDATE in second session runs immediately (not waiting for ending transaction in first session).

What I'm misunderstood ?

P.S.

If in first transaction, instead of CTE use:

perform id, status FROM users WHERE id in( 2,4,7,6) order by id FOR UPDATE;

Then lock works as expected.

Oto Shavadze
  • 40,603
  • 55
  • 152
  • 236

1 Answers1

1

I think this is due to the way that PL/pgSQL's SELECT INTO works under the hood.

The apparent underlying cause is mentioned in the docs:

if a locking clause is used in a cursor's query, only rows actually fetched or stepped past by the cursor will be locked.

...and this effect flows through to any CTE in the query, where rows will only be fetched (and therefore locked) when the outer query needs them.

This is pretty easy to demonstrate. First, some setup:

CREATE TABLE t (x INT);
INSERT INTO t VALUES (1),(2),(3);

This query will lock all rows, as expected:

WITH cte AS (SELECT x FROM t WHERE x IN (1,2,3) FOR UPDATE)
SELECT x FROM cte WHERE x = 1

However, this will produce the behaviour you observed, locking only the first record:

DECLARE c CURSOR FOR
  WITH cte AS (SELECT x FROM t WHERE x IN (1,2,3) FOR UPDATE)
  SELECT x FROM cte WHERE x = 1;
FETCH NEXT FROM c;

My guess is that a SELECT INTO in PL/pgSQL operates in much the same way; as mentioned in the docs, if the query returns multiple rows, then the first row is assigned to the target variable, and the rest are ignored (so there is never a need for it to fetch more than one record).

This statement, however, will lock all rows in t:

DO $$
DECLARE i INT;
BEGIN
  WITH cte AS (SELECT x FROM t WHERE x IN (1,2,3) FOR UPDATE)
  SELECT x FROM cte WHERE x = 1 INTO STRICT i;
END
$$ LANGUAGE plpgsql;

Note the use of SELECT INTO STRICT. The STRICT keyword causes an error to be thrown if multiple rows are returned by the query. But in order to discover that a second row was returned, PL/pgSQL must attempt to fetch it, which causes the CTE query to fetch the remaining records (locking them in the process).

All of this is a non-issue most of the time in a PL/pgSQL function, as you'd typically fetch all records from a FOR UPDATE query, and pass them along to whoever is actually updating them. On the other hand, if you want a query's side-effects but aren't interested in its output, then (as you observed) you can use PERFORM, which will run a query to completion, but discard the results.

In the rare case where you need to (i.e. lock) all rows, while only returning a subset of them, you'll probably need to write your own loop.

Nick Barnes
  • 19,816
  • 3
  • 51
  • 63
  • By the way, if you want to leave a transaction hanging, no need to loop a billion times; just run an explicit `BEGIN` beforehand, and don't `COMMIT` until you're ready. – Nick Barnes Jan 30 '18 at 13:59