0

I wish to draw 1,000 random samples of size 50 from the dataset and show that E(xi^ui) = 0 holds for each simulated sample. My code is below, and I have been trying to debug for some time now, but I can't figure out what's wrong.

The dataset is called 'dataset' and it has two columns: 'y' and 'x'. I want to regress y on x, get the residuals, and show that they are not correlated with x.

lm.fit <- NA
resid.lm.fit <- NA
correlation <- NA

for (i in 1:1000){
  samp1 <- sample(dataset, size=50, replace=T)
  lm.fit[i] <- lm(y ~ x, data=samp1)
  resid.lm.fit[i]<-resid(lm.fit[i])
  correlation[i] <- cor.test(resid.lm.fit[i],samp1$x)
}

The errors I am getting are:

Error in resid.lm.fit[i] <- resid(lm.fit[i]) : 
  replacement has length zero
In addition: Warning message:
In lm.fit[i] <- lm(y ~ x, data = samp1) :
  number of items to replace is not a multiple of replacement length

2 Answers2

0

The problem is that you are trying to store a bunch of non-atomic objects in vectors. If you make lm.fit, resid.lm.fit, and correlation lists instead of vectors you will be okay:

set.seed(123)
dataset <- data.frame(
  x=1:250,
  y=3*(1:250)+rnorm(250,0,40))
##
lm.fit <- list(NULL)
resid.lm.fit <- list(NULL)
correlation <- list(NULL)
for (i in 1:1000){
  samp1 <- dataset[sample(1:nrow(dataset), size=50, replace=T),]
  lm.fit[[i]] <- lm(y ~ x, data=samp1)
  resid.lm.fit[[i]] <- resid(lm.fit[[i]])
  correlation[[i]] <- cor.test(resid.lm.fit[[i]],samp1$x)
}

Then you can check the results of individual cor.tests like this:

> correlation[[1]]

    Pearson's product-moment correlation

data:  resid.lm.fit[[i]] and samp1$x
t = 0, df = 48, p-value = 1
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2783477  0.2783477
sample estimates:
         cor 
2.991262e-17 

Also, to sample from a data.frame use df[ sample(1:nrow(df),...), ], not sample(df,...).

nrussell
  • 18,382
  • 4
  • 47
  • 60
  • Thanks, that seems to have worked! I wonder now, though, how to work with the 1,000 results within the correlation list. I want to show that x and u are not correlated for each simulated sample, but how will I know that without going through each of the 1,000 tests individually? – hemingway2014 Oct 19 '14 at 22:47
  • You can extract the correlation estimates with `sapply(correlation,function(x){x$estimate})` – nrussell Oct 19 '14 at 23:57
  • Yes, but I would actually need to extract those where the p-value is < 0.05. The point is to show that all the p-values are greater than that (i.e. that the correlation test in each simulated sample failed to reject the null that the correlation is greater than 0). – hemingway2014 Oct 20 '14 at 00:08
  • See the "Value" section of the help file (`?cor.test`) - you can access these values in the same way that I used `sapply` to access `estimate`. – nrussell Oct 20 '14 at 00:33
0

If dataset is a data frame, then sample(dataset, size=50, replace=T) will randomly pick out columns of your data frame 50 times with replacement. I assume you are trying to pick out rows. In that case dataset[sample(1:nrow(dataset), size=50, replace=T),] will fix that.

petew
  • 671
  • 8
  • 13