Working with a random sample within a for loop

Question

I wish to draw 1,000 random samples of size 50 from the dataset and show that E(xi^ui) = 0 holds for each simulated sample. My code is below, and I have been trying to debug for some time now, but I can't figure out what's wrong.

The dataset is called 'dataset' and it has two columns: 'y' and 'x'. I want to regress y on x, get the residuals, and show that they are not correlated with x.

lm.fit <- NA
resid.lm.fit <- NA
correlation <- NA

for (i in 1:1000){
  samp1 <- sample(dataset, size=50, replace=T)
  lm.fit[i] <- lm(y ~ x, data=samp1)
  resid.lm.fit[i]<-resid(lm.fit[i])
  correlation[i] <- cor.test(resid.lm.fit[i],samp1$x)
}

The errors I am getting are:

Error in resid.lm.fit[i] <- resid(lm.fit[i]) : 
  replacement has length zero
In addition: Warning message:
In lm.fit[i] <- lm(y ~ x, data = samp1) :
  number of items to replace is not a multiple of replacement length

score 0 · Accepted Answer · answered Oct 19 '14 at 20:32

0

The problem is that you are trying to store a bunch of non-atomic objects in vectors. If you make lm.fit, resid.lm.fit, and correlation lists instead of vectors you will be okay:

set.seed(123)
dataset <- data.frame(
  x=1:250,
  y=3*(1:250)+rnorm(250,0,40))
##
lm.fit <- list(NULL)
resid.lm.fit <- list(NULL)
correlation <- list(NULL)
for (i in 1:1000){
  samp1 <- dataset[sample(1:nrow(dataset), size=50, replace=T),]
  lm.fit[[i]] <- lm(y ~ x, data=samp1)
  resid.lm.fit[[i]] <- resid(lm.fit[[i]])
  correlation[[i]] <- cor.test(resid.lm.fit[[i]],samp1$x)
}

Then you can check the results of individual cor.tests like this:

> correlation[[1]]

    Pearson's product-moment correlation

data:  resid.lm.fit[[i]] and samp1$x
t = 0, df = 48, p-value = 1
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2783477  0.2783477
sample estimates:
         cor 
2.991262e-17

Also, to sample from a data.frame use df[ sample(1:nrow(df),...), ], not sample(df,...).

answered Oct 19 '14 at 20:32

nrussell

18,382
4
47
60

Thanks, that seems to have worked! I wonder now, though, how to work with the 1,000 results within the correlation list. I want to show that x and u are not correlated for each simulated sample, but how will I know that without going through each of the 1,000 tests individually? – hemingway2014 Oct 19 '14 at 22:47
You can extract the correlation estimates with `sapply(correlation,function(x){x$estimate})` – nrussell Oct 19 '14 at 23:57
Yes, but I would actually need to extract those where the p-value is < 0.05. The point is to show that all the p-values are greater than that (i.e. that the correlation test in each simulated sample failed to reject the null that the correlation is greater than 0). – hemingway2014 Oct 20 '14 at 00:08
See the "Value" section of the help file (`?cor.test`) - you can access these values in the same way that I used `sapply` to access `estimate`. – nrussell Oct 20 '14 at 00:33

score 0 · Answer 2 · answered Oct 19 '14 at 20:37

0

If dataset is a data frame, then sample(dataset, size=50, replace=T) will randomly pick out columns of your data frame 50 times with replacement. I assume you are trying to pick out rows. In that case dataset[sample(1:nrow(dataset), size=50, replace=T),] will fix that.

answered Oct 19 '14 at 20:37

petew

671
8
13

Sorry @nrussell, didn't see your answer until after I had posted mine. – petew Oct 19 '14 at 20:39

Working with a random sample within a for loop

2 Answers2