R - Group by a value and calculate the percentage of the whole group

Question

EDIT: My question was not clear enough. I apologize.

The problem was to define groups and assign values of a column of a dataframe to it. I solved the question myself with a chain of ifelse and the comments here. Thanks for that. I then did it manually for each column seperately.

data %>% 
  mutate(group = ifelse(richness <= -0.6, "1",
                        ifelse(richness > -0.6 & richness <= -0.2, "2",
                               ifelse(richness >-0.2 & richness <= 0.2, "3",
                                      ifelse(richness >0.2 & richness <= 0.6, "4",
                                             ifelse(richness >0.6, "5", NA)))))) %>%
                          group_by(group) %>% 
                          summarise(percentage=n()*100/"No.of.values")

As a side note you can use `dplyr`'s function `case_when()` to avoid nested `ifelse()` for clearer code ;) — Rekyt, Oct 09 '18 at 15:04

score 2 · Answer 1 · answered Oct 08 '18 at 15:37

2

Using carb variable from mtcars data set as example:

prop.table(table(mtcars$carb)) * 100

     1      2      3      4      6      8
21.875 31.250  9.375 31.250  3.125  3.125

If you want to define groups your self you can use the cut function:

groups <- c(0,2,6,8) # interval values for the groups
prop.table(table(cut(mtcars$carb, breaks=groups))) * 100

 (0,2]  (2,6]  (6,8]
53.125 43.750  3.125

answered Oct 08 '18 at 15:37

Karolis Koncevičius

9,417
9
56
89

OP says there are multiple columns involved, so maybe something like `addmargins(table(Map(cut, DF[cols], breaks = groups_list)))` – Frank Oct 08 '18 at 15:41
1

@Frank you are probably right. It's not 100% clear from the way the question is posed now what the end goal is. And the title makes it even less clear to me. – Karolis Koncevičius Oct 08 '18 at 15:53

score 0 · Answer 2 · answered Oct 08 '18 at 15:43

0

Work flow.

Add a dummy column;
Group by the dummy column;
Count the subgroups.

Here are some sample codes:

require(dplyr)

# generate fake data.
set.seed(123456)

sample <-  data.frame(Nums = rep(NA,100))
sample$Nums <- sample(-100:100, 100, replace = T)/100 
size <- length(sample$Nums)

# add dummy column
sample <- sample %>% 
  # changed the dummy column accordingly
  mutate(dummy = ifelse(Nums < 0, "A", "B")) %>% 
  # group nums
  group_by(dummy) %>% 
  # calculate percentage
  summarise(percentage = n()*100/size)

head(sample)

# A tibble: 2 x 3
  dummy count percentage
  <chr> <int>      <dbl>
1 A        50         50
2 B        50         50

answered Oct 08 '18 at 15:43

Wenlong Liu

444
2
13

That helps a lot, thank you Wenlong! One problem remains: I have about 30 columns and 6 groups. If I do that manually for every column and group it will take forever. Is there a way to define the groups once (they are the same for every column) and then go through all columns in a loop? Thank you!! – Canna Oct 08 '18 at 15:55
@Heiko, A reproducible example with sample data will be very helpful in this case. – Wenlong Liu Oct 08 '18 at 16:14

R - Group by a value and calculate the percentage of the whole group

2 Answers2

Work flow.