2

I have a data set as shown below:

data <- tribble(
  ~top_1, ~top_2, ~top_3,
  "A",     "B",    "C",
  "B",     "B",    "B",   
  "C",     "B",    "C",
  "A",     "B",    "B",
  "A",     "A",    "A",
  "B",     "B",    "A",
  "C",     "A",    "C",
  "A",     "A",    "A",
  "A",     "C",    "B",
  "B",     "B",    "C",
)

And now, I want to count distinct the rows for each column and have a new data set something like this:

new_data <- tribble(
  ~product, ~top_1, ~top_2, ~top_3,
     "A",    .50,    .30,     .30,
     "B",    .30,    .60,     .30,
     "C",    .20,    .10,     .40,
)

Could you please help me to be able to create this data?

datazang
  • 989
  • 1
  • 7
  • 20
  • 1
    Possible duplicate of [Column-Wise Percentage of Different Entires](https://stackoverflow.com/questions/9623763/in-r-how-can-i-compute-percentage-statistics-on-a-column-in-a-dataframe-tabl) – M-- Nov 06 '19 at 19:21

3 Answers3

4
lvl = unique(unlist(data))
sapply(data, function(x) prop.table(table(factor(x, lvl))))
#  top_1 top_2 top_3
#A   0.5   0.3   0.3
#B   0.3   0.6   0.3
#C   0.2   0.1   0.4
d.b
  • 32,245
  • 6
  • 36
  • 77
3

One base R option could be:

table(stack(data))/nrow(data)

values top_1 top_2 top_3
     A   0.5   0.3   0.3
     B   0.3   0.6   0.3
     C   0.2   0.1   0.4

And if you want it as a data.frame:

as.data.frame.matrix(table(stack(data))/nrow(data))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

Here is one option where we gather into 'long' format, get the count and reshape to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
data %>%
   gather %>% 
   group_by_all %>% 
   count %>%
   group_by(key) %>%
   mutate(n = n/sum(n)) %>% 
   pivot_wider( names_from = key, values_from = n)
# A tibble: 3 x 4
# Groups:   value [3]
#  value top_1 top_2 top_3
#  <chr> <dbl> <dbl> <dbl>
#1 A       0.5   0.3   0.3
#2 B       0.3   0.6   0.3
#3 C       0.2   0.1   0.4
akrun
  • 874,273
  • 37
  • 540
  • 662