GROUP BY or COUNT Like Field Values - UNPIVOT?

Question

I have a table with test fields, Example

id         | test1    | test2    | test3    | test4    | test5
+----------+----------+----------+----------+----------+----------+
12345      | P        | P        | F        | I        | P

So for each record I want to know how many Pass, Failed or Incomplete (P,F or I)

Is there a way to GROUP BY value?

Pseudo:

SELECT ('P' IN (fields)) AS pass
WHERE id = 12345

I have about 40 test fields that I need to somehow group together and I really don't want to write this super ugly, long query. Yes I know I should rewrite the table into two or three separate tables but this is another problem.

Expected Results:

passed     | failed   | incomplete
+----------+----------+----------+
3          | 1        | 1

Suggestions?

Note: I'm running PostgreSQL 7.4 and yes we are upgrading

you want the sum of testes passed, fauled and incomplete from all the rows correct? — aF., Nov 22 '11 at 13:54
yes but for a particular Id. An Id will only have one row or results — Phill Pafford, Nov 22 '11 at 13:57
So, the final result is the passed, failed and incomplete for one id only? — aF., Nov 22 '11 at 14:03
yes 1 Id only. So I need to test how many passed, failed, incomplete for one Id. Each row has about 40 tests and I wanted to know the best solution for this. — Phill Pafford, Nov 22 '11 at 14:06

Erwin Brandstetter · Accepted Answer · 2011-11-22T14:07:50.503

3

I may have come up with a solution:

SELECT id
      ,l - length(replace(t, 'P', '')) AS nr_p
      ,l - length(replace(t, 'F', '')) AS nr_f
      ,l - length(replace(t, 'I', '')) AS nr_i
FROM   (SELECT id, test::text AS t, length(test::text) AS l  FROM test) t

The trick works like this:

Transform the rowtype into its text representation.
Measure character-length.
Replace the character you want to count and measure the change in length.
Compute the length of the original row in the subselect for repeated use.

This requires that P, F, I are present nowhere else in the row. Use a sub-select to exclude any other columns that might interfere.

Tested in 8.4 - 9.1. Nobody uses PostgreSQL 7.4 anymore nowadays, you'll have to test yourself. I only use basic functions, but I am not sure if casting the rowtype to text is feasible in 7.4. If that doesn't work, you'll have to concatenate all test-columns once by hand:

SELECT id
      ,length(t) - length(replace(t, 'P', '')) AS nr_p
      ,length(t) - length(replace(t, 'F', '')) AS nr_f
      ,length(t) - length(replace(t, 'I', '')) AS nr_i
FROM   (SELECT id, test1||test2||test3||test4 AS t FROM test) t

This requires all columns to be NOT NULL.

edited Nov 22 '11 at 14:07

answered Nov 22 '11 at 14:01

Erwin Brandstetter

605,456
145
1,078
1,228

Could you explain how you're pulling all the test# columns into this query? – Phill Pafford Nov 22 '11 at 14:07
@PhillPafford: I did: `Transform the rowtype into its text representation.` Look at this very plain expression: `test::text`. `test`being the table name. – Erwin Brandstetter Nov 22 '11 at 14:12
@ErwinBrandstetter: *‘`test` being the **column** name’* ? – Andriy M Nov 22 '11 at 14:27
@ErwinBrandstetter excellent idea :) the total length - the length without the specific P, F or I character it'll give the result – aF. Nov 22 '11 at 14:29
@AndriyM when he does the `test1||test2||test3||test4 AS t` he is building the text of all tests. For your example you need to replace the columns to insert all that you need. – aF. Nov 22 '11 at 14:31
@AndriyM: `test` being the **table** name. You can select a whole row of any relation in the `FROM` list by referencing it by name. The result is a composite type (the rowtype) with all columns of the relation. This way you can cast it all to `text` in one fell swoop. *Careful*: a column name takes precedence over a table name, you cannot reference the table itself if it holds a column of the same name. Use a table alias as remedy for this limitation. – Erwin Brandstetter Nov 22 '11 at 14:45
interesting approach +1, still testing this out for performance and 7.4 looks to be working as expected – Phill Pafford Nov 22 '11 at 14:49
@aF. & Erwin: Thanks! I didn't know about that feature of PostgreSQL (like I still don't about many others probably). – Andriy M Nov 22 '11 at 14:50

score 1 · Answer 2 · 2011-11-22T13:53:55.993

Essentially, you need to unpivot your data by test:

id         | test     | result   
+----------+----------+----------+
12345      | test1    | P        
12345      | test2    | P        
12345      | test3    | F        
12345      | test4    | I        
12345      | test5    | P

...

- so that you can then group it by test result.

Unfortunately, PostgreSQL doesn't have pivot/unpivot functionality built in, so the simplest way to do this would be something like:

select id, 'test1' test, test1 result from mytable union all
select id, 'test2' test, test2 result from mytable union all
select id, 'test3' test, test3 result from mytable union all
select id, 'test4' test, test4 result from mytable union all
select id, 'test5' test, test5 result from mytable union all

...

There are other ways of approaching this, but with 40 columns of data this is going to get really ugly.

EDIT: an alternative approach -

select r.result, sum(char_length(replace(replace(test1||test2||test3||test4||test5,excl1,''),excl2,'')))
from   mytable m, 
       (select 'P' result, 'F' excl1, 'I' excl2 union all
        select 'F' result, 'P' excl1, 'I' excl2 union all
        select 'I' result, 'F' excl1, 'P' excl2) r
group by r.result

I updated my question with the expected output. Thanks for the suggestion but that's going to be a massive query. Any other thoughts? — Phill Pafford, Nov 22 '11 at 13:16
@Phill Pafford: See the edited answer for an alternative approach. It's still ugly, but a bit more concise. — , Nov 22 '11 at 13:55

score 0 · Answer 3 · answered Nov 22 '11 at 14:43

You could use an auxiliary on-the-fly table to turn columns into rows, then you would be able to apply aggregate functions, something like this:

SELECT
  SUM(fields = 'P') AS passed,
  SUM(fields = 'F') AS failed,
  SUM(fields = 'I') AS incomplete
FROM (
  SELECT
    t.id,
    CASE x.idx
      WHEN 1 THEN t.test1
      WHEN 2 THEN t.test2
      WHEN 3 THEN t.test3
      WHEN 4 THEN t.test4
      WHEN 5 THEN t.test5
    END AS fields
  FROM atable t
    CROSS JOIN (
      SELECT 1 AS idx
      UNION ALL SELECT 2
      UNION ALL SELECT 3
      UNION ALL SELECT 4
      UNION ALL SELECT 5
    ) x
  WHERE t.id = 12345
) s

score 0 · Answer 4 · answered Nov 22 '11 at 15:36

Edit: just saw the comment about 7.4, I don't think this will work with that ancient version (unnest() came a lot later). If anyone thinks this is not worth keeping, I'll delete it.

Taking Erwin's idea to use the "row representation" as a base for the solution a bit further and automatically "normalize" the table on-the-fly:

select id,
       sum(case when flag = 'F' then 1 else null end) as failed,
       sum(case when flag = 'P' then 1 else null end) as passed,
       sum(case when flag = 'I' then 1 else null end) as incomplete
from (
  select id, 
         unnest(string_to_array(trim(trailing ')' from substr(all_column_values,strpos(all_column_values, ',') + 1)), ',')) flag
  from (
    SELECT id,
           not_normalized::text AS all_column_values
    FROM not_normalized
  ) t1
) t2
group by id

The heart of the solution is Erwin's trick to make a single value out of the complete row using the cast not_normalized::text. The string functions are applied to strip of the leading id value and the brackets around it.

The result of that is transformed into an array and that array is transformed into a result set using the unnest() function.

To understand that part, simply run the inner selects step by step.

Then the result is grouped and the corresponding values are counted.

GROUP BY or COUNT Like Field Values - UNPIVOT?

4 Answers4

Linked