0

I have asked a very similar question recently here - lapply instead of for loop for randomised hypothesis testing r

But I now require a more simple output, and I'm struggling to tweak the previously suggested (and super helpful) code.

So,

I have observed data, and I am now randomising as above -

set.seed(42)
ID <- sample(1:30, 100, rep=T) 
Trait <- sample(0:1, 100, rep=T) 
Year <- sample(1992:1999, 100, rep=T)
df <- cbind(ID, Trait, Year)
df <- as.data.frame(df)

I want to group by year, and extract the overall mean n Trait, as well as 95% CIs between groups.

Maybe something like this

df <- df %>%
group_by(Year) %>%
dplyr::summarise(
n_Trait = sum(Trait == 1),
n_total = length(Trait)) %>%
ungroup()

I now want to repeat the above x times, and extract the mean n_Trait and a 95%CI from those output iterations. Very much like this, but I don't want to run the full ls model https://www.tidymodels.org/learn/statistics/bootstrap/

I hope that makes sense?

Carl
  • 4,232
  • 2
  • 12
  • 24
Jamie Dunning
  • 153
  • 1
  • 9

1 Answers1

2

You could put the construction of your data.frame df inside a function and then use replicate:

my_fun <- function() {
  ID <- sample(1:30, 100, rep=T) 
  Trait <- sample(0:1, 100, rep=T) 
  Year <- sample(1992:1999, 100, rep=T)
  
  res <- tibble(Trait = Trait, Year = Year) %>%
    group_by(Year) %>%
    dplyr::summarise(
      n_Trait = sum(Trait == 1),
      n_total = length(Trait)) %>%
    ungroup()
  
  return(res)
}

bind_rows(replicate(10, my_fun(), simplify = FALSE))

This way you replicate the experiment ten times and can do further analysis afterwards.

Cettt
  • 11,460
  • 7
  • 35
  • 58