-1

I have a column in mytable(Movies), Actors. I am trying to get percentage of times each actor shows up.

CREATE TABLE movies AS SELECT * FROM ( VALUES
  ('Robert DeSouza'),
  ('Tony Wagner'),
  ('Sean Cortese'),
  ('Robert DeSouza'),
  ('Robert DeSouza'),
  ('Tony Wagner'),
  ('Sean Cortese'),
  ('Charles Bastian'),
  ('Robert DeSouza')
) AS t(actors);

The result I am requesting:

select Actors, (some formula * 100) as "The Ratio" from Movies

Actors                       The Ratio
Robert DeSouza                 44%
Tony Wagner                    22%
Sean Cortese                   22%
Charles Bastian                11%
                               100%
Air
  • 8,274
  • 2
  • 53
  • 88
Jason Smith
  • 127
  • 1
  • 3
  • 14

4 Answers4

2

You can do this using window functions. For the numeric calculation:

select m.actor,
       count(*) * 1.0 / sum(count(*)) over () as ratio
from movies m
group by m.actor;

You can convert the ratio to whatever format you want -- multiply by 100 to get a percentage, use string concatenation to add a percent. To me, something called a ratio should be between 0 and 1 (in this case).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2
SELECT actors, floor(count(*) *100 / sum(count(*)) OVER ())
FROM movies
GROUP BY actors
ORDER BY count(*) DESC;

This gets you most of the way there..

     actors      | floor 
-----------------+-------
 Robert DeSouza  |    44
 Tony Wagner     |    22
 Sean Cortese    |    22
 Charles Bastian |    11

Not sure how you're getting 100 in your example. You want the floor of the percentages, and you want it to magically say 100? If 44+22+22+11 = 100 today then it's just one of those days. But we can do that too.

SELECT actors AS "Actors", r::text || '%' AS "The Ratio"
FROM (
  SELECT
    actors AS "Actors",
    floor(count(*) *100 / sum(count(*)) OVER ()) AS r,
    false AS is_total
  FROM movies
  GROUP BY actors
  UNION ALL
    SELECT *
    FROM ( VALUES
      (null, 100, true)
    ) AS t(actors, floor, is_total)
  ORDER BY 3, 2 DESC
) AS t(actors,r);

Output,

     Actors      | The Ratio 
-----------------+-----------
 Robert DeSouza  | 44%
 Tony Wagner     | 22%
 Sean Cortese    | 22%
 Charles Bastian | 11%
                 | 100%

If you don't want to floor you can round()

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
1

There is no numeric type that includes a percent sign (the % character) so your problem can't be solved solely by an expression that calculates the numeric value. In addition to calculating that value, you need to format it as text using the to_char() function.

This function takes a numeric value and converts it to a text value using a formatting literal that you provide as the second argument. In this case it looks like what you want to do is to round to the nearest percent and show the percent sign. You probably want to use '990%' as your formatting literal. Adding this to your example table and the window function that Gordon suggested yields:

[local] air@postgres=> CREATE TABLE movies AS SELECT * FROM ( VALUES
...   ('Robert DeSouza'),
...   ('Tony Wagner'),
...   ('Sean Cortese'),
...   ('Robert DeSouza'),
...   ('Robert DeSouza'),
...   ('Tony Wagner'),
...   ('Sean Cortese'),
...   ('Charles Bastian'),
...   ('Robert DeSouza')
... ) AS t(actors);

SELECT 9
Time: 715.613 ms
[local] air@postgres=> select actors, to_char(100 * count(*) / sum(count(*)) over (), '990%') as "The Ratio" from movies group by actors;
┌─────────────────┬───────────┐
│     actors      │ The Ratio │
├─────────────────┼───────────┤
│ Charles Bastian │   11%     │
│ Tony Wagner     │   22%     │
│ Sean Cortese    │   22%     │
│ Robert DeSouza  │   44%     │
└─────────────────┴───────────┘

(4 rows)

Time: 31.501 ms

You want to make sure you account for the need to display a full range of possible values, including 100% and 0%; since to_char() will round to fit your desired precision, it is possible for an actor to show zero as their ratio, despite existing in the table:

[local] air@postgres=> delete from movies where actors <> 'Tony Wagner';
DELETE 7
Time: 36.697 ms
[local] ahuth@postgres=> insert into movies (actors) select 'Not Tony Wagner' from generate_series(1,500);
INSERT 0 500
Time: 149.022 ms
[local] ahuth@postgres=> select actors, to_char(100 * count(*) / sum(count(*)) over (), '990%') as "The Ratio" from movies group by actors;
┌─────────────────┬───────────┐
│     actors      │ The Ratio │
├─────────────────┼───────────┤
│ Tony Wagner     │    0%     │
│ Not Tony Wagner │  100%     │
└─────────────────┴───────────┘
(2 rows)

Time: 0.776 ms

If you want to expand this to show decimal places, just modify the format string. Use 0 in your formatting literal when you want to force leading or trailing zeros.

Community
  • 1
  • 1
Air
  • 8,274
  • 2
  • 53
  • 88
  • Thanks Air, I have edited the original post with your suggestion. I do not know why it still does not equal 100%. It just adds all the values up and produces a 9%. Any idea why this is happening. – Jason Smith Dec 21 '16 at 23:15
0

This works by joining the grouped results with the overall results using union.

EDIT: Dropped concatenation and a 0 after comments from @Air.

select actors actor, ratio from ( 
    select 0 sort
    , actors
    , round(count(*) * 100.0 / ( select count(*) 
    from movies ),0) ratio 
    from movies 
    group by actors 
    union 
    select 1 sort
    , 'Total'
    , round(count(*) / count(*) * 100,0) ratio 
    from movies 
) actors 
order by sort, actors ; 

-- results

      actor      | ratio 
-----------------+-------
 Charles Bastian | 11
 Robert DeSouza  | 44
 Sean Cortese    | 22
 Tony Wagner     | 22
 Total           | 100
(5 rows)
Keith John Hutchison
  • 4,955
  • 11
  • 46
  • 64
  • It works @Air ... and it answers the question. The order by was required in the psql client I had to get the total at the bottom ... the union resolves the rounding issue. Typos ... where ... – Keith John Hutchison Dec 21 '16 at 23:58
  • If `100.00` isn't a typo then I'd love to know your motivation for adding the hundredths digit. As far as it working, sure, lots of bad queries work just fine on 9 rows of example data – Air Dec 22 '16 at 00:12
  • When I tested the code ... I think in terms of rounding a percentage to 2 points ... however point taken ... I'll edit. The concatenation ... are you seriously saying that to_char is going to be any better? Are you suggesting that a union on two aggregate queries is somehow a bad/slow query? Are you having a bad day or are you like this all the time? – Keith John Hutchison Dec 22 '16 at 00:19