0

So I'm trying to get r to report the share of a certain variable taking on a specific value in a group. For example: Let`s consider a dataset which consists of groups 1,2 and 3. Now I would like to know the percentage a Variable1 takes on the value 500 in group 1,2 and 3 and incorporate this as a new vaiable. Is there a convenient way to get to a solution? So it should look something like this:

df
Group  Var1   Var1_perc
1       0      50
1       400    50
1       500    50
1       500    50

and so on for the other groups

1 Answers1

1

I would use tidyverse to do this

Calculate how often a variable takes on a certain value in a group

library(tidyverse)
df %>% 
 group_by(Group,Var1) %>% 
 summarise(count = n()) 

To calculate the percentage in a group:

df %>% 
  left_join(df %>% 
               group_by(grp) %>% 
               summarise(n = n()), by = "grp" ) %>%
  group_by(grp,value) %>%
  summarise(percentage = n()/n)

The whole left_join stuff is to calculate how often a group appears in the table. I couldn't think of a better one rn.

Ivn Ant
  • 135
  • 8
  • I keep getting this: Error: `n()` must only be used inside dplyr verbs. – philipp.kn_98 Sep 22 '20 at 18:29
  • try `dplyr::summarise()` – Ivn Ant Sep 22 '20 at 18:32
  • 1
    Just FYI it looks like the tidyverse-related functions you've used are all from dplyr, which means you can decrease overhead by just loading that instead of all the tidyverse libraries – camille Sep 22 '20 at 18:41
  • 1
    Yes you are totally right, it's probably a habit to load `tidyverse`. I guess the error @philipp.kn_98 gets is because of some library conflicts, e.g. loading `plyr` on top of `dplyr` or `tidyverse` – Ivn Ant Sep 22 '20 at 18:49