1

I want to create a stored function whose body will loop over the results of a WITH statement. Is that possible? So far I'm getting syntax errors on FOR which follows WITH. Minimal example:

DO $$
DECLARE loop_col1 RECORD;
BEGIN 
    WITH temp_table AS (
        SELECT col1, col2 FROM files
    )
    FOR loop_col1 IN
        SELECT DISTINCT col1 FROM temp_table
    LOOP
        -- do something using all columns from temp_table
    END LOOP;
END $$;

The reason I want to loop over col1 is that it groups values of col2 and for each group of values I want to lay new relations in other table (e.g. the 'do something' part)

Other solutions that allow me to have a 'temporary table' of data are also fine.

The full query for context is below. See how the loop body references the WITH's table multiple times.

CREATE OR REPLACE FUNCTION enqueue_postprocessing_jobs()
RETURNS void
LANGUAGE plpgsql AS $$
DECLARE
    loop_group_uufid RECORD;
    loop_uurid RECORD;
BEGIN
    -- 1. Find information of files that might be usable for new postprocessing
    -- job. Query results in all files that are related to postprocessing jobs
    WITH candidate_postprocessing_files
    SELECT uufid, uurid, source, group_uufid
    FROM files
    WHERE source='CRACKS' OR source='BACKGROUND'
    AND uufid NOT IN (
        -- Find all files which have do have a job associated with them
        SELECT DISTINCT uufid FROM get_files_and_jobs()
        WHERE type='POSTPROCESSING'
    )

    -- 2. Group candidate files by group_uufid (a) and loop over them to see if
    -- all necessary files are present (b), insert new job if so (c) and
    -- connect job to files (d)
    FOR loop_group_uufid IN
        SELECT DISTINCT group_uufid
        FROM candidate_postprocessing_files
    LOOP
        -- a. Make reusable temp table with only the files of this group
        WITH group_postprocessing_files AS (
            SELECT * FROM candidate_postprocessing_files
            WHERE group_uufid=loop_group_uufid
        ),
        -- b. Select ids of those files and runs where all necessary files are
        -- present. Query results in a table is 0 rows long if not all files
        -- are present for this job, otherwise, it should be 2 rows long.
        -- One row for cracks, one row for background.
        -- In other words, this whole table function as a single job candidate.
        job_candidate AS (
            SELECT uufid, uurid FROM group_postprocessing_files
            WHERE 'CRACKS' IN (SELECT source FROM group_postprocessing_files)
            AND 'BACKGROUND' IN (SELECT source FROM group_postprocessing_files)
        )

        -- c&D. Insert those new jobs and connect files
        -- Should only loop once
        FOR loop_uurid IN
            SELECT DISTINCT uurid FROM job_candidate
        LOOP
            -- c. Insert jobs
            WITH job AS (
                -- TODO Support not_before and priority
                SELECT enqueue_job AS uujid
                FROM enqueue_job(uurid, '' , 'POSTPROCESSING')
            )
            -- d. Connect new jobs to files
            INSERT INTO files_to_jobs
            SELECT job_candidate.uufid, job.uujid
            FROM job;
        END LOOP;
    END LOOP;
END $$;

So my questions are:

  1. How do I use a FOR loop in combination with a WITH selection?
  2. If that is not possible, how do I otherwise get temporary table data that I can loop over?
Arthur
  • 193
  • 1
  • 9

1 Answers1

4

You can use a FOR loop in combination with a WITH clause but the WITH clause must belong to a SELECT statement (WITH clause is not an independent statement):

DO
$$
DECLARE loop_col1 RECORD;
BEGIN 
    FOR loop_col1 IN
        WITH temp_table AS (
         SELECT col1, col2 FROM files
        )
        SELECT DISTINCT col1 FROM temp_table
    LOOP
        -- do something
        raise notice 'loop_uurid %', loop_col1;
    END LOOP;
END $$;

Otherwise you can create a temporary table:

DO $$
DECLARE loop_col1 RECORD;
BEGIN 

    CREATE TEMPORARY TABLE temp_table ON COMMIT DROP AS
        SELECT col1, col2 FROM files;

    FOR loop_col1 IN
        SELECT DISTINCT col1 FROM temp_table
    LOOP
        -- do something
        raise notice 'loop_uurid %', loop_col1;
    END LOOP;

END $$;
pifor
  • 7,419
  • 2
  • 8
  • 16
  • Thanks. I guess the reason I want to have the for loop following the with statement is that I want to reuse the temporary table it creates throughout the whole body of the for loop, but also create the iteration var from that table (edited minimal example). Is that possible or do I need to use a temporary table then? – Arthur Jun 03 '20 at 10:49
  • I also have 2 questions about the temporary tables: (1) Do they conflict if a stored function with this code as a body is called twice, *without* a commit in between? (2) Since I'm doing insertions at the end, should I lock at the start of the code block or does Postgresql automatically lock when I call this function? – Arthur Jun 03 '20 at 10:54
  • Q1: I think you don't need a temporary table if you want to loop over the rows returned by the SELECT query (what you call "temporary table" but what is actuallly named a cursor: https://www.postgresql.org/docs/12/plpgsql-cursors.html#PLPGSQL-CURSOR-FOR-LOOP). – pifor Jun 03 '20 at 12:10
  • Q2: (1) yes in the same session: because the temporary table already exists. (2) in general you need to rows locks in application code if you run UPDATE statements using pessimistic locking approach. So no need to lock rows or tables in this case. – pifor Jun 03 '20 at 12:16
  • Thanks for the help. I needed to know whether or not I could use FOR after WITH. I tried using temporary tables, but because I created one within the loop as well it became very difficult with name collisions. Eventually I managed to rewrite the query completely without needing a loop or temporary tables. Thanks for the help! – Arthur Jun 04 '20 at 18:11