0

This may sound like an incredibly naive question but here's what I'm doing and here's why this has had me stumped.

I have a population of 1000 samples from which I am trying to sub-sample 5%, 10%, 15%...... 100% using the following code in R.

subData <- replicate(30, sample(Data,55,replace=TRUE))

I am then computing the mean and Standard deviation for further analysis. What is confusing me is when I am selecting 100% of the population and replicating it 30 times, with replace=TRUE, why is the Standard Deviation of means non-zero? Surely, if one selects all the data points 100 times and calculates the mean it should be the same and hence Standard deviation should be 0. Am I missing something or am I doing something wrong with my code in R?

Any help would be greatly appreciated!

talat
  • 68,970
  • 21
  • 126
  • 157
VGu
  • 386
  • 5
  • 23
  • 2
    What you say will be true if you set `replace=FALSE`. Setting `replace=TRUE` you allow single elements to be drawn several times and so you may have different means. – nicola Jan 24 '15 at 08:19
  • 5
    Congratulations, you've discovered [bootstrapping](http://en.wikipedia.org/wiki/Bootstrapping_(statistics)). – Roland Jan 24 '15 at 08:43
  • Thanks @Roland and Nicola. I did accidentally discover bootstrapping! – VGu Jan 25 '15 at 09:36

0 Answers0