Percentage of factor levels by group in R

Question

I am trying to calculate the percentage of different levels of a factor within a group.

I have nested data and would like to see the percentage of schools in each country is a private schools (factor with 2 levels).

However, I cannot figure out how to do that.

# my data:
CNT <- c("A", "A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D",  
"D", "D", "D", "D")
SCHOOL <- c(1:5, 1:3, 1:6, 1:6)
FACTOR <- as.factor(c(1,2,1,2,1,1,1,2,1,2,2,2,1,1,1,1,1,1,1,1))
mydata <- data.frame(CNT, SCHOOL, FACTOR)
head(mydata)

I want a column with the percentage of one level of the Factor (lets say 1) within each country.

I think you can also relate to this previous post for a more general answer to calculating relative frequencies in a grouped df using `dplyr` [Relative frequencies / proportions with dplyr](https://stackoverflow.com/questions/24576515/relative-frequencies-proportions-with-dplyr) — alex_jwb90, Jul 17 '20 at 11:56
Another way : `mydata %>% count(CNT, FACTOR) %>% group_by(CNT) %>% mutate(n = n/sum(n))` — Ronak Shah, Jul 17 '20 at 12:04

score 2 · Answer 1 · answered Jul 17 '20 at 11:45

2

Just group your data by CNT and then summarise the groups to calculate how many instances of FACTOR == 1 you have vs the total number of observations within that group (n()).

library(dplyr)

mydata %>%
  group_by(CNT) %>%
  summarise(
    priv_perc = sum(FACTOR == 1, na.rm=T) / n()
  )

answered Jul 17 '20 at 11:45

alex_jwb90

1,663
1
11
20

This works on my example but gives me a nasty error on my dataset `n()` must only be used inside dplyr verbs. and some more backtrace that I do not understand – H.Stevens Jul 17 '20 at 11:52
it means you can't use `n()` outside a `dplyr` function such as `mutate` or `summarise`. Since you tagged your question as `dplyr` I was expecting you'd be fine with working on your data dplyr-style :) – alex_jwb90 Jul 17 '20 at 11:55
I usually am, but I do not understand everything, I just learn as I go and I never had that problem before. Thank you for the explanation! – H.Stevens Jul 17 '20 at 12:08

score 1 · Accepted Answer · answered Jul 17 '20 at 11:51

1

Another solution (with base-R):

prop.table(table(mydata$CNT, mydata$FACTOR), margin = 1)

            1         2
  A 0.6000000 0.4000000
  B 0.6666667 0.3333333
  C 0.5000000 0.5000000
  D 1.0000000 0.0000000

answered Jul 17 '20 at 11:51

AndreasM

902
5
10

Percentage of factor levels by group in R

2 Answers2

Linked