Function to count number of distinct levels of a factor variable in each group?

Asked Feb 10 '20 at 14:35

Active Feb 10 '20 at 14:39

Viewed 50 times

I have a dataset with multiple observations from several venues, many of which have multiple observations for them e.g.

ID <- paste("s", seq(1,150,1), sep="")
venue <- paste("L", sample(c(1:40), size=150, replace =T), sep="")
group <- c(rep("A", 100), rep("B", 50))
outcome_variable <- c(rnorm(100, 50, 10), rnorm(50, 40, 12))

reprex <- data.frame(ID,venue,group, outcome_variable)

I know I can get the number of observations per group and summary statistics for the continuous variable e.g.

reprex %>%
  group_by(group) %>%
  summarise(obs=n(),
         mean_outcome =mean(outcome_variable))

but is there a straightforward way of getting the number of different venues (within which the observations are nested) for each group within this pipe?

This feels like it should be really straightforward but I've been searching previous questions for a while and I can't seem to find anything!

edited Feb 10 '20 at 14:39

Sotos

51,121
6
32
66

asked Feb 10 '20 at 14:35

Mel

1

I guess you need `reprex %>% group_by(group) %>% summarise(obs=n(), mean_outcome =mean(outcome_variable), diff_venues = n_distinct(venue))` – arg0naut91 Feb 10 '20 at 14:37
3

How much more straight forward do you want it to be? You are literally grouping by your group and counting...unless I am understanding your question wrong – Sotos Feb 10 '20 at 14:38

Function to count number of distinct levels of a factor variable in each group?

0 Answers0