0

I have a dataset with multiple observations from several venues, many of which have multiple observations for them e.g.

ID <- paste("s", seq(1,150,1), sep="")
venue <- paste("L", sample(c(1:40), size=150, replace =T), sep="")
group <- c(rep("A", 100), rep("B", 50))
outcome_variable <- c(rnorm(100, 50, 10), rnorm(50, 40, 12))

reprex <- data.frame(ID,venue,group, outcome_variable)

I know I can get the number of observations per group and summary statistics for the continuous variable e.g.

reprex %>%
  group_by(group) %>%
  summarise(obs=n(),
         mean_outcome =mean(outcome_variable))

but is there a straightforward way of getting the number of different venues (within which the observations are nested) for each group within this pipe?

This feels like it should be really straightforward but I've been searching previous questions for a while and I can't seem to find anything!

Sotos
  • 51,121
  • 6
  • 32
  • 66
Mel
  • 700
  • 6
  • 31
  • 1
    I guess you need `reprex %>% group_by(group) %>% summarise(obs=n(), mean_outcome =mean(outcome_variable), diff_venues = n_distinct(venue))` – arg0naut91 Feb 10 '20 at 14:37
  • 3
    How much more straight forward do you want it to be? You are literally grouping by your group and counting...unless I am understanding your question wrong – Sotos Feb 10 '20 at 14:38

0 Answers0