With the following sample dataframe I would like to draw a stratified random sample (e.g., 40%) of the ID's "ID" from each level of the factor "Cohort":
data<-structure(list(Cohort = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), ID = structure(1:20, .Label = c("a1 ",
"a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "b10", "b11",
"b12", "b13", "b14", "b15", "b16", "b17", "b18", "b19", "b20"
), class = "factor")), .Names = c("Cohort", "ID"), class = "data.frame", row.names = c(NA,
-20L))
I only know how to draw a random number of rows using the following:
library(dplyr)
data %>%
group_by(Cohort) %>%
sample_n(size = 10)
But my actual data are longitudinal so I have multiple cases of the same ID within each cohort and several cohorts of different sizes, thus the need to select a proportion of unique ID's. Any assistance would be appreciated.