PostgreSQL query with dynamic number of columns

Question

I am trying to find a method to return a record set with a dynamic number of columns. I can write one query that will produce the list of column names I need as such:

SELECT DISTINCT name FROM tests WHERE group = 'basic';

This will return a short list like 'poke', 'prod', 'hit', 'drop', etc. Then I want a table produced showing a series of tests where each of those tests were run. Every morning we look at what the developers have been doing and poke and prod at it so each test will be run for each day. This query I can write statically:

SELECT (SELECT success FROM test_results AS i
        WHERE i.name = 'poke'
        AND i.date = o.date) AS 'poke',
       (SELECT success FROM test_results AS i
        WHERE i.name = 'prod'
        AND i.date = o.date) AS 'prod',
...
FROM test_results AS o GROUP BY date
HAVING date > now() - '1 week'::interval;

However, this is hard-coded to the tests we are running on each day. If we now need to start kicking the device each day, we need to update the query. If we decide the drop test is no longer needed, after a week, the drop test column should drop off the report as it no longer occurs in the results. Returning NULL for missing tests when only certain dates have a results entry is perfectly acceptable.

Is there a method to create a dynamic list of columns from the results by just using regular SQL in a query?

I was attempting to build up the data I need in parts by using a WITH query, but I can't find a way to build up the final row correctly from dynamic information.

Edit: Here's some sample data from the last two days:

CREATE TABLE test_results (
    name TEXT NOT NULL,
    date DATE default now() NOT NULL,
    success BOOLEAN NOT NULL
);

INSERT INTO test_results (name, date, success) VALUES ('hit',  '2017-06-20', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('poke', '2017-06-20', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('prod', '2017-06-20', TRUE);

INSERT INTO test_results (name, date, success) VALUES ('poke', '2017-06-21', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('prod', '2017-06-21', TRUE);

INSERT INTO test_results (name, date, success) VALUES ('poke', '2017-06-22', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('prod', '2017-06-22', FALSE);

INSERT INTO test_results (name, date, success) VALUES ('poke', '2017-06-23', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('prod', '2017-06-23', TRUE);
INSERT INTO test_results (name, date, success) VALUES ('drop', '2017-06-23', TRUE);

If I run my query against the data range of 2017-06-21 to 2017-06-23, I'd like to get results like the following including a matrix of any tests that were run in that time:

date        | poke   | prod   | drop
------------+--------+--------+-----
2017-06-21  | TRUE   | TRUE   | NULL
2017-06-22  | TRUE   | FALSE  | NULL
2017-06-23  | TRUE   | TRUE   | TRUE

The names poke, prod, and drop were all names found in the name field of a row during that time period. NULL is returned for the detailed query for any tests that don't have a record for that date.

Edit your question and provide sample data and desired results. — Gordon Linoff, Jun 23 '17 at 20:55
You need to build your statement dynamically within a procedure and then execute that. I've done things like that in the past but I'd need to look it up on my drive. If nobody gives you the right answer I'll look on it tomorrow. — Kamil Gosciminski, Jun 23 '17 at 21:43

score 2 · Answer 1 · answered Jun 23 '17 at 23:16

2

Enable the 'tablefunc' extension, and then use the 'crosstab' function; see the PG docs: https://www.postgresql.org/docs/current/static/tablefunc.html. The argument to the crosstab function should be the text of a query that produces three columns: the date, the test name, and the test success, in that order.

answered Jun 23 '17 at 23:16

rd_nielsen

2,407
2
11
18

The main disadvantage of the `crosstab`: it is still requires to hardcode the fields list in the main query. – Abelisto Jun 24 '17 at 07:01
Here's a blog post showing how to use dynamic SQL to generate the field list, so that hardcoding it is not necessary: https://splinterofthesingularity.blogspot.com/2021/03/crosstabbing-data-in-postgres-with.html – rd_nielsen Mar 21 '21 at 14:53

Kristo Mägi · Accepted Answer · 2017-06-24T07:55:46.290

There are different methods used, some already mentioned here like crosstab. Also, you can build an own function that builds the query dynamically and returns as TABLE and few more methods.

But all require you to predefine an exact number of outputs and their data types.

If I understand your case that is something you would not want as you mentioned:

If we now need to start kicking the device each day, we need to update the query.

Which is pretty much the same downside using crosstab and other ways.

So there is a way using Cursors. It is probably not the best way to go and if you can use crosstab then that is probably better.
But at least it is an option I'll add with comments in code.

Solution:

-- Function for opening cursor
CREATE OR REPLACE
FUNCTION    test_stats(
                c REFCURSOR,    -- cursor name
                sdate date,     -- start date of period wanted (included)
                edate date,     -- end date of period wanted (included)
                gtype text      -- you had in your 'tests' table some group type which I included just in case
            )
RETURNS     REFCURSOR
LANGUAGE    PLPGSQL
AS
$main$
BEGIN
    OPEN    c
    FOR
    -- Following dynamic query building can be
    -- used also if want to go with function that RETURNS TABLE
    EXECUTE format(
            '   SELECT  r.date,
                        %s
                FROM    test_results r
                WHERE   r.date BETWEEN %L AND %L
                GROUP BY 1
            ',
                -- Here we build for each 'name' own statement and 
                -- aggregate together with comma separator to feed
                -- into main query.
                -- P.S. We need to double check result unfortunately
                --      against test_results table once to get pre-filter
                --      for names in specified date range.
                --      With this we eliminate tests that for sure will
                --      not be presented in the range. In given test data
                --      this means eliminating 'hit'.
            (
                SELECT  string_agg(
                            DISTINCT format(
                                '(  SELECT  success
                                    FROM    test_results i
                                    WHERE   i.name = %1$L
                                    AND     i.date = r.date ) AS "%1$s"',
                                t.name
                            ),
                            ','
                        )
                FROM    tests t,
                LATERAL (   SELECT  array_agg( DISTINCT r.name )
                            FROM    test_results r
                            WHERE   r.date BETWEEN sdate AND edate
                        ) a( lst )
                WHERE   t.group = gtype     -- the group type is used here
                AND     t.name = ANY ( a.lst::text[] )
            ),
            sdate,      -- start date for between statement
            edate       -- end date for between statement
        );
    RETURN c;
END;
$main$;

-- Usage example:
BEGIN;
SELECT test_stats( 'teststats1', '2017-06-21'::date, '2017-06-23'::date, 'basic' );
FETCH ALL IN teststats1;
COMMIT;

-- Result (from your given test data set):
    date    | drop | poke | prod
------------+------+------+------
 2017-06-22 |      | t    | f
 2017-06-21 |      | t    | t
 2017-06-23 | t    | t    | t
(3 rows)

As I mentioned, it is not the perfect way, but it does the job :)

As the realization has now set in that SQL itself is a statically-type language just like C or Java, what I'm asking for isn't possible. I'm thinking in terms of dynamically-types languages like Python or JavaScript. While various functions can change their signature to match their context, ultimately, the types are decided upon while compiling (preparing?) the original SQL statement. — penguin359, Jun 27 '17 at 22:32

score -1 · Answer 3 · answered Jun 23 '17 at 21:02

-1

I would picture a query like this:

SELECT tr.name, tr.date, tr.success
FROM tests t JOIN
     test_results tr
     ON t.testid = tr.testid
WHERE t.group = 'basic' AND tr.date > now() - '1 week'::interval;

You are probably better off pivoting the data at the application level.

answered Jun 23 '17 at 21:02

Gordon Linoff

1,242,037
58
646
786

The very reason I am trying to do this in SQL is because I don't control the application layer. It just presents the results of a SQL statement as a table/matrix in the application. – penguin359 Jun 23 '17 at 21:21

PostgreSQL query with dynamic number of columns

3 Answers3

Linked