0

My dataset looks like (call it data_xy)

id X Y
1  5 10
1  6 11
1  4 8
2  3 9
2  3 12
3  4 10
...

observations from a total of N ids. Each id has several rows of measurements.

I want to bootstrap the id with replacement. It is very likely that the bootstrap id contains duplicates.

b_idx <- sample.int(N,N,T)

it's likely that

b_idx=c(1,1,3,4,4,4....)

Then how to create the bootstrap sample with b_idx? If I do

data_xy[data_xy$id==b_idx,]

each id (with its repeated measures) will occur only ones in my bootstrap dataset. What I really want is to replicate the observations for id=k the number of times this id occurs in b_idx. How can I achieve this?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
user3075021
  • 95
  • 1
  • 8

2 Answers2

1

You don't actually need to use the ID directly; you can just sample row numbers, and then directly index the data.frame with those:

# How many rows in the data.frame?
n <- nrow(mtcars)

# Sample them
mtcars[sample(x = n, size =  n, replace = TRUE), ]

If you pass in the same integer twice, you get that row twice. Here's an example of that principle in action:

mtcars[c(1, 1), ]

If you don't know it already, be sure to check out the boot package, which automates a lot of bootstrapping scenarios for you.

Matt Parker
  • 26,709
  • 7
  • 54
  • 72
0

I use the 'matches' function from the grr package for this.

Indices <- unlist(matches(b.idx, data_xy$ID, list=TRUE))

b.data <- data_xy[Indices, ]