I have a data table with different 20 sample IDs. Now I want to reduce the sample size randomly with a fixed distribution of IDs, meaning that I want to randomly draw lets say 7 values out of 'A' and 5 values out of 'B' so my data.table has 12 rows instead of 20 and than build the mean of a column I generated. Now I want to repeat that 100 times via bootstrapping and see if the means vary, so I want to do some statistics like sd, mean, etc. on it.
The background is I have a small set and a bigger sample set. I want to reduce the bigger sample set to evaluate the accurarcy of the smaller sample set. I am fairly new to R and appreciate any help. Thanks
data <- data.table(Sample = c('A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B'),
weight=rnorm(1:22),
height=rnorm(1:22))
# I want to draw randomly 7 values out of A and 5 values out of B and than get the mean of this new df and do that whole step 100 times
#to again build the mean over all 100 replicates
set.seed(4561)
new_df <- data %>%
group_by(Sample) %>%
nest() %>%
mutate(n = c(7,5)) %>%
mutate(samp = map2(data, n, sample_n)) %>%
select(Sample, samp) %>%
unnest() %>%
mutate(diff.height.weight = height-weight) %>%
mutate(means = mean(diff.height.weight))%>%
bootstraps(means, times=100)