1

I am trying to create a funnel chart, but my data is in a wide format right now. It has a couple groups that I want to compare (e.g., A and B in the example below) and they are on different scales, so I want to use proportions as well as the raw values.

I have a starting table that looks like this:

| group | One | Two | Three |
|-------|-----|-----|-------|
|  A    | 100 | 75  | 50    |
|  B    | 10  | 7   | 6     |
|-------|-----|-----|-------|

I need to get the table to look like this:

| group | stage | count | proportion of stage One |
|-------|-------|-------|-------------------------|
|  A    | One   | 100   | 1                       |
|  A    | Two   | 75    | 0.75                    |
|  A    | Three | 50    | 0.5                     |
|  B    | One   | 10    | 1                       |
|  B    | Two   | 7     | 0.7                     |
|  B    | Three | 6     | 0.6                     |
|-------|-------|-------|-------------------------|

The proportion is calculated as each row's value divided by the maximum value for that group. Stage One is always gonna be 100%, then Stage 2 is the count for that row divided by the max of count for that group value.

The best I could do is connect to the database in python and use Pandas to melt the table, but I would really like to keep everything in a SQL script.

I've been fumbling around and making zero progress four too long. Any help is much appreciated.

NLR
  • 1,714
  • 2
  • 11
  • 21

2 Answers2

2

You can do this with a UNION query, selecting first the values of One, then Two and Three with the appropriate division to get the proportion:

SELECT "group", 'One' AS stage, One, 1 AS proportion
FROM data
UNION ALL
SELECT "group", 'Two', Two, ROUND(1.0*Two/One, 2)
FROM data
UNION ALL
SELECT "group", 'Three', Three, ROUND(1.0*Three/One, 2)
FROM data
ORDER BY "group"

Output:

group   stage   one     proportion
A       One     100     1
A       Two     75      0.75
A       Three   50      0.50
B       One     10      1
B       Two     7       0.70
B       Three   6       0.60

Demo on dbfiddle

Nick
  • 138,499
  • 22
  • 57
  • 95
1

I would recommend a lateral join:

SELECT t."group", v.stage, v.count, v.count * 1.0 / t.one
FROM t CROSS JOIN LATERAL
     (VALUES ('One', one),
             ('Two', two),
             ('Three', three)
     ) v(stage, count);

A lateral join should be a little faster than a union all on a small amount of data. As the data gets bigger, only scanning the table once is a bigger win. However, the biggest win is when the "table" is really a more complex query. Then the lateral join can be significantly better in performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786