R coding: bootstrap a dataset with repeated measures

Question

My dataset looks like (call it data_xy)

observations from a total of N ids. Each id has several rows of measurements.

I want to bootstrap the id with replacement. It is very likely that the bootstrap id contains duplicates.

b_idx <- sample.int(N,N,T)

it's likely that

b_idx=c(1,1,3,4,4,4....)

Then how to create the bootstrap sample with b_idx? If I do

data_xy[data_xy$id==b_idx,]

each id (with its repeated measures) will occur only ones in my bootstrap dataset. What I really want is to replicate the observations for id=k the number of times this id occurs in b_idx. How can I achieve this?

score 1 · Answer 1 · answered Apr 03 '17 at 15:45

You don't actually need to use the ID directly; you can just sample row numbers, and then directly index the data.frame with those:

# How many rows in the data.frame?
n <- nrow(mtcars)

# Sample them
mtcars[sample(x = n, size =  n, replace = TRUE), ]

If you pass in the same integer twice, you get that row twice. Here's an example of that principle in action:

mtcars[c(1, 1), ]

If you don't know it already, be sure to check out the boot package, which automates a lot of bootstrapping scenarios for you.

score 0 · Answer 2 · edited Apr 03 '17 at 15:32

0

I use the 'matches' function from the grr package for this.

Indices <- unlist(matches(b.idx, data_xy$ID, list=TRUE))

b.data <- data_xy[Indices, ]

edited Apr 03 '17 at 15:32

Fabio says Reinstate Monica

5,271
9
40
61

answered Apr 03 '17 at 14:47

B. Goulooze

1
1

R coding: bootstrap a dataset with repeated measures

2 Answers2