Count distinct among the rows and aggregate

Question

I have a data set as shown below:

data <- tribble(
  ~top_1, ~top_2, ~top_3,
  "A",     "B",    "C",
  "B",     "B",    "B",   
  "C",     "B",    "C",
  "A",     "B",    "B",
  "A",     "A",    "A",
  "B",     "B",    "A",
  "C",     "A",    "C",
  "A",     "A",    "A",
  "A",     "C",    "B",
  "B",     "B",    "C",
)

And now, I want to count distinct the rows for each column and have a new data set something like this:

new_data <- tribble(
  ~product, ~top_1, ~top_2, ~top_3,
     "A",    .50,    .30,     .30,
     "B",    .30,    .60,     .30,
     "C",    .20,    .10,     .40,
)

Could you please help me to be able to create this data?

Possible duplicate of [Column-Wise Percentage of Different Entires](https://stackoverflow.com/questions/9623763/in-r-how-can-i-compute-percentage-statistics-on-a-column-in-a-dataframe-tabl) — M--, Nov 06 '19 at 19:21

score 4 · Answer 1 · answered Nov 06 '19 at 18:54

4

lvl = unique(unlist(data))
sapply(data, function(x) prop.table(table(factor(x, lvl))))
#  top_1 top_2 top_3
#A   0.5   0.3   0.3
#B   0.3   0.6   0.3
#C   0.2   0.1   0.4

answered Nov 06 '19 at 18:54

d.b

32,245
6
36
77

thanks a lot! Is there anyway to have it as a data frame? – datazang Nov 06 '19 at 18:57
@zineda, just wrap it in `as.data.frame()` - `as.data.frame(sapply(data, function(x) prop.table(table(factor(x, lvl)))))` – d.b Nov 06 '19 at 18:58

tmfmnk · Answer 2 · 2019-11-06T19:00:31.473

3

One base R option could be:

table(stack(data))/nrow(data)

values top_1 top_2 top_3
     A   0.5   0.3   0.3
     B   0.3   0.6   0.3
     C   0.2   0.1   0.4

And if you want it as a data.frame:

as.data.frame.matrix(table(stack(data))/nrow(data))

edited Nov 06 '19 at 19:00

answered Nov 06 '19 at 18:55

tmfmnk

38,881
4
47
67

akrun · Accepted Answer · 2019-11-06T19:08:36.450

2

Here is one option where we gather into 'long' format, get the count and reshape to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
data %>%
   gather %>% 
   group_by_all %>% 
   count %>%
   group_by(key) %>%
   mutate(n = n/sum(n)) %>% 
   pivot_wider( names_from = key, values_from = n)
# A tibble: 3 x 4
# Groups:   value [3]
#  value top_1 top_2 top_3
#  <chr> <dbl> <dbl> <dbl>
#1 A       0.5   0.3   0.3
#2 B       0.3   0.6   0.3
#3 C       0.2   0.1   0.4

edited Nov 06 '19 at 19:08

answered Nov 06 '19 at 18:51

akrun

874,273
37
540
662

Thank you for your comment. Well, if I have more than 10 rows, should I replace 10 with the number of rows? – datazang Nov 06 '19 at 18:58
Somehow, it did not come out as a percentage. The sum of each column should be 100%. – datazang Nov 06 '19 at 19:05
@zineda Sorry, that was a bug. I fixed it – akrun Nov 06 '19 at 19:09

Count distinct among the rows and aggregate

3 Answers3