1

I have a table with data:

create Table ProductSales
(    
    Productname varchar(50),
    Year int,
    Sales int
);
 
Insert into ProductSales 
values ('A', 2017, 100),
       ('A', 2018, 150),
       ('A', 2019, 300),
       ('A', 2020, 500),
       ('A', 2021, 450),
       ('A', 2022, 675),
       ('B', 2017, null),
       ('B', 2019, 1120),
       ('B', 2020, 750),
       ('B', 2021, 1500),
       ('B', 2022, 1980);
Productname Year Sales
A 2017 100
A 2018 150
A 2019 300
A 2020 500
A 2021 450
A 2022 675
B 2017 null
B 2019 1120
B 2020 750
B 2021 1500
B 2022 1980

And I want to get the result like this:

Productname 2017 2018 2019 2020 2021 2022
A 100 150 300 500 450 675
B null null 1120 750 1500 1980

Is it possible to make it with postgresql? I tried to use crosstab() but failed.

Besides, it's possible to have separate table with need years, e.g.:

|Year|
|----|
|2017|
|2018|
|2019|
|2020|

Does it change something?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • `It's possible to have separate table with need years,` Can you please explain it; Your table misses values 2021, 2022; So, I am curious – Tushar Mar 14 '23 at 11:51
  • 1
    You can't have a dynamic number of columns in a SQL query. A fundamental restriction of the SQL language is, that the number, names and data types of all columns of a query must be know to the database _before_ it starts retrieving data. Using the `crosstab()` function does not work around that restriction. –  Mar 14 '23 at 11:55
  • Ok, then you can use postgres functions and dynamic sql as well – Leo Abalckin Mar 14 '23 at 13:32
  • 1
    " Your table misses values 2021, 2022; So, I am curious " - I mean that it's needed to select only data for years from this years table – Leo Abalckin Mar 14 '23 at 13:33

2 Answers2

2

You can't have a dynamic number of columns in a single SQL query.

If you use two, the first one can determine your target structure, the second one use it. You can have dynamic SQL construct and

  1. print the query for you to use
  2. prepare a statement for you to only call an execute on
  3. define a function for you to call
  4. execute it, writing output to a table that you can read from in 2nd step.

The list isn't exhaustive. None of this is pretty, all is error-prone. You can keep pushing how "dynamic" it gets until you run out of patience (or prescription meds): demo extending Tushar's

CREATE PROCEDURE first_step_procedure() LANGUAGE PLPGSQL AS $procedure$
BEGIN
  IF EXISTS (SELECT true 
             FROM pg_prepared_statements 
             WHERE name='second_step_statement') THEN
      DEALLOCATE second_step_statement;
  END IF;
  EXECUTE format(   $prep$
                    PREPARE second_step_statement AS
                    SELECT *
                    FROM crosstab(
                        'SELECT Productname, Year, Sales
                        FROM ProductSales
                        ORDER BY 1, 2'
                      , 'SELECT DISTINCT Year FROM ProductSales ORDER BY 1'
                    ) AS ct ("Productname" text, %1$s) 
                    $prep$
                  , (SELECT string_agg(format('%I int',Year),', ')
                     FROM (SELECT DISTINCT Year 
                           FROM ProductSales 
                           ORDER BY 1) AS a ) );
END $procedure$;

And then, each time you want to use it, you need to repeat your two steps:

call first_step_procedure();--determines the structure to redefine your statement
execute second_step_statement;

create table keep_output_in_a_table as execute second_step_statement;

I went with prepared statements because they are at least session-specific - other users can share the first_step_procedure() but their second_step_statement generated by it won't be interfering.


If you want to maintain a table(view) based on that ProductSales, you can push this further with a trigger that keeps redefining the dependent object: demo

create Table ProductSales (
    Productname varchar(50),
    Year int,
    Sales int );
CREATE FUNCTION ProductSalesYearly_maintainer_function() RETURNS TRIGGER
  language plpgsql as $procedure$
BEGIN
  EXECUTE format(   $prep$
                    drop table if exists ProductSalesYearly;
                    CREATE TABLE ProductSalesYearly AS
                    SELECT *
                    FROM crosstab(
                        'SELECT Productname, Year, Sales
                        FROM ProductSales
                        ORDER BY 1, 2'
                      , 'SELECT DISTINCT Year FROM ProductSales ORDER BY 1'
                    ) AS ct ("Productname" text, %1$s) 
                    $prep$
                  , (SELECT string_agg(format('%I int',Year),', ')
                     FROM (SELECT DISTINCT Year 
                           FROM ProductSales 
                           ORDER BY 1) AS a ) );
   RETURN NULL;
END $procedure$;

CREATE TRIGGER ProductSalesYearly_maintainer_trigger
    AFTER INSERT OR UPDATE OR DELETE ON ProductSales
    FOR EACH STATEMENT
    EXECUTE FUNCTION ProductSalesYearly_maintainer_function();

Keep in mind that static references to ProductSalesYearly that's being constantly redefined, keep breaking - even though the name remains the same, it's a new object. It also keeps firing on all traffic on ProductSales. You could mitigate that to an extent by constructing "migrations", altering the table adding/removing columns each time, instead of dropping and reconstructing.

Zegarek
  • 6,424
  • 1
  • 13
  • 24
  • At the end I need only some view to see the result. is it possible? – Leo Abalckin Mar 14 '23 at 13:48
  • @LeoAbalckin If you just want to keep the output, you can dump it to a table. Added that in an edit. – Zegarek Mar 14 '23 at 13:54
  • No, I want to see the changes in view automatically after changing data in ProductSales – Leo Abalckin Mar 14 '23 at 13:55
  • Sure, why not. You can complicate things further by setting up a trigger on insert, update and delete events in ProductSales, that will keep rebuilding your table, redefining some view, etc. Note that in each case, the redefined thing is named the same, but it's not the same object - any static reference will break each time. However, if all it does is feed a `select` query issued elsewhere, the select will keep pointing at the name, ignoring the fact the underlying objects keep getting swapped out. – Zegarek Mar 14 '23 at 14:03
  • @LeoAbalckin Added an example and a demo of the trigger behaviour. – Zegarek Mar 14 '23 at 14:19
1

By using your data in DBFIDDLE

The working query using crosstab can be written as :

SELECT *
FROM crosstab(
  'SELECT Productname, Year, Sales
   FROM ProductSales
   ORDER BY 1, 2',
   'SELECT DISTINCT Year FROM ProductSales ORDER BY 1'
) AS ct ("Productname" varchar(50), "2017" int, "2018" int, "2019" int, "2020" int, "2021" int, "2022" int);

This gives me expected output :

Productname 2017 2018 2019 2020 2021 2022
A 100 150 300 500 450 675
B null null 1120 750 1500 1980
Tushar
  • 3,527
  • 9
  • 27
  • 49
  • 2
    that isn't dynamic as the result wouldn't add 2023 automatically or 2016 for that matter – nbk Mar 14 '23 at 12:06
  • I agree with the above statement; However; I believe; as per OPs expected output; OP is calling `2017,2018,2019,2020,2021,2022` as dynamic columns. I will wait until OP replies. @a_horse_with_no_name also has clarified about dynamic column names to OP – Tushar Mar 14 '23 at 12:13
  • You can't use "2017" int, "2018" int, etc. because we have to take all those values from the ProductSales table – Leo Abalckin Mar 14 '23 at 13:35
  • But can we change in some way this part "2017" int, "2018" int, "2019" int, "2020" int, "2021" int, "2022" int into something like array_to_string( [Select year from years], ',', ' int')? – Leo Abalckin Mar 14 '23 at 13:37
  • @LeoAbalckin; Can you try updated query and let me know if you face any issues ? – Tushar Mar 14 '23 at 14:38
  • 1
    When I run it on https://dbfiddle.uk/7GPhPmB1 I get an error ERROR: syntax error at or near "' || (SELECT string_agg(format('" LINE 7: ) AS ct ("Productname" varchar(50), ' || The same problem ib DBeaver and jupyter Notebook – Leo Abalckin Mar 14 '23 at 15:00
  • 1
    Well, I will delete the second query and will keep the working one. If I am able to write a generic query; I will add it here and I will ping you :) BTW, @Zegarek has already completed the answer using a stored procedure :) – Tushar Mar 14 '23 at 19:41