Summarise proportions of character values across columns in table

Question

In this kind of data frame:

df <- data.frame(
     w1 = c("A","A","B","C","A"),
     w2 = c("C","A","A","C","C"),
     w3 = c("C","A","B","C","B")
   )

I need to calculate across all columns the within-column proportions of the character values. Interestingly, the following code works with the large actual data set but throws an error with the above toy data:

df %>%
  summarise(across(everything(), ~prop.table(table(.))*100))

What I'm looking for is a data frame with exact proportions of all values in each column plus a column indicating the values:

       w1  w2  w3
1  A   60  40  20
2  B   20   0  40
3  C   20  60  40

score 5 · Answer 1 · answered Aug 01 '23 at 08:30

5

You can try table + stack like below

> proportions(table(stack(df)), 2) * 100
      ind
values w1 w2 w3
     A 60 40 20
     B 20  0 40
     C 20 60 40

answered Aug 01 '23 at 08:30

ThomasIsCoding

96,636
9
24
81

How would this work as part of the pipe? And I misleadingly wrote I want a table; I've corrected this to 'data frame'. – Chris Ruehlemann Aug 01 '23 at 08:41
@ChrisRuehlemann you can do `as.data.frame(proportions(table(stack(df)), 2) * 100) |> pivot_wider(names_from = ind, values_from = Freq)` – Mark Aug 01 '23 at 08:43

score 4 · Accepted Answer · edited Aug 01 '23 at 10:21

4

Here's a workaround using tidyverse packages:

library(dplyr)
library(tidyr)

pivot_longer(df, everything()) |> 
    count(value, name) |>
    mutate(n = n / sum(n) * 100, .by = name) |>
    pivot_wider(names_from = name, values_from = n, values_fill = 0)

edited Aug 01 '23 at 10:21

Sotos

51,121
6
32
66

answered Aug 01 '23 at 08:42

Mark

7,785
2
14
34

Maël · Answer 3 · 2023-08-01T08:52:35.757

4

The error comes from the fact that you have one column with only two distinct values (w2), thus you create a column of length 2 while the others have length 3:

Error in names(dots)[[i]] : subscript out of bounds

Convert all your columns to factor first, to make sure that all columns have the same levels (and table will account for even the empty ones), and then apply reframe (summarise is slowly deprecated in favor of reframe when multiple rows are outputted):

df %>%
  mutate(across(contains("w"), \(x) factor(x, levels = unique(df$w1)))) %>% 
  reframe(across(everything(), \(x) c(prop.table(table(x))*100))) %>% 
  mutate(value = unique(df$w1), .before = 1)

#   value w1 w2 w3
# 1     A 60 40 20
# 2     B 20  0 40
# 3     C 20 60 40

edited Aug 01 '23 at 08:52

answered Aug 01 '23 at 08:45

Maël

45,206
3
29
67

Good point! or `c` – Maël Aug 01 '23 at 08:51
this is very pretty!! My favourite :) – Mark Aug 01 '23 at 08:52
I got that error too, while using reframe, but I wasn't able to make heads or tails of it - not super descriptive ‍ – Mark Aug 01 '23 at 08:55

Summarise proportions of character values across columns in table

3 Answers3

Linked