3

In this kind of data frame:

df <- data.frame(
     w1 = c("A","A","B","C","A"),
     w2 = c("C","A","A","C","C"),
     w3 = c("C","A","B","C","B")
   ) 

I need to calculate across all columns the within-column proportions of the character values. Interestingly, the following code works with the large actual data set but throws an error with the above toy data:

df %>%
  summarise(across(everything(), ~prop.table(table(.))*100))

What I'm looking for is a data frame with exact proportions of all values in each column plus a column indicating the values:

       w1  w2  w3
1  A   60  40  20
2  B   20   0  40
3  C   20  60  40
Mark
  • 7,785
  • 2
  • 14
  • 34
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34

3 Answers3

5

You can try table + stack like below

> proportions(table(stack(df)), 2) * 100
      ind
values w1 w2 w3
     A 60 40 20
     B 20  0 40
     C 20 60 40
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • How would this work as part of the pipe? And I misleadingly wrote I want a table; I've corrected this to 'data frame'. – Chris Ruehlemann Aug 01 '23 at 08:41
  • @ChrisRuehlemann you can do `as.data.frame(proportions(table(stack(df)), 2) * 100) |> pivot_wider(names_from = ind, values_from = Freq)` – Mark Aug 01 '23 at 08:43
4

Here's a workaround using tidyverse packages:

library(dplyr)
library(tidyr)

pivot_longer(df, everything()) |> 
    count(value, name) |>
    mutate(n = n / sum(n) * 100, .by = name) |>
    pivot_wider(names_from = name, values_from = n, values_fill = 0)
Sotos
  • 51,121
  • 6
  • 32
  • 66
Mark
  • 7,785
  • 2
  • 14
  • 34
4

The error comes from the fact that you have one column with only two distinct values (w2), thus you create a column of length 2 while the others have length 3:

Error in names(dots)[[i]] : subscript out of bounds

Convert all your columns to factor first, to make sure that all columns have the same levels (and table will account for even the empty ones), and then apply reframe (summarise is slowly deprecated in favor of reframe when multiple rows are outputted):

df %>%
  mutate(across(contains("w"), \(x) factor(x, levels = unique(df$w1)))) %>% 
  reframe(across(everything(), \(x) c(prop.table(table(x))*100))) %>% 
  mutate(value = unique(df$w1), .before = 1)

#   value w1 w2 w3
# 1     A 60 40 20
# 2     B 20  0 40
# 3     C 20 60 40
Maël
  • 45,206
  • 3
  • 29
  • 67