0

I have a data frame (df) representing a 100X100 unit square, and after having created a geographic population like this (z= x+y, where x<-df$x, and y<-df$y), I need to extract a systematic sample of size n=100 from it. How can I do it ?

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
27titanik
  • 21
  • 5

1 Answers1

0

To draw a systematic (iid) sample from a unit square, here is one method:

# set random seed for reproducibility
set.seed(123)
# create a 100X2 matrix of unit square observations
myUnitSquareSample <- cbind("x"=runif(100), "y"=runif(100))

To put this sample in a data.frame together with your z variable:

df <- data.frame("x"=myUnitSquareSample[,"x"],
                 "y"=myUnitSquareSample[,"y"],
                 "z"=rowSums(myUnitSquareSample))

If you already have a pre-existing data.frame, df, say of 10,000 observations. You can employ the sample function, as suggested by @kunal-puri as follows:

# set random seed for reproducibility
set.seed(11111)

# choose the set of 100 rows
mySample <- sample(1:nrow(df), size=100)
# extract sampled observations from df
mySampled.df <- df[mySample,]

It is a good idea to keep the selected set of rows in its own vector in case you need to use it further on in your script.

To extract an evenly spaced sample, try the following:

envelySpacedMat <- expand.grid(y=seq(0, 1, length.out=10), 
                                x=seq(0, 1, length.out=10))

    df <- data.frame("x"=envelySpacedMat[,"x"],
                 "y"=envelySpacedMat[,"y"],
                 "z"=rowSums(envelySpacedMat))

This selects the borders, to avoid this, you can alter the from and to arguments slightly.

If you would like to select 100 observations from an existing data.frame that are more or less evenly spaced, you might try the following:

# select 100 obs roughly evenly dispersed:
obsSystematic <- as.integer(seq(from=1, to=nrow(df), length.out = 100))

mySystematicdf <-df[obsSystematic,] 
lmo
  • 37,904
  • 9
  • 56
  • 69
  • Sorry, maybe it is me that is stupid, but I can see only a random sample. I need a systematic one, that can consist in pick one observatione every 100, but in a orderly way. If I had to draw a systematic sampling on the square, I would draw just 100 points, each at the same distance from each other. – 27titanik Apr 24 '16 at 13:38
  • Yes! The last one should be the right one, but it doesn't work. Can you re-check it ? – 27titanik Apr 24 '16 at 14:26
  • It works with nrow(df), instead of nrowS(df) . Unfortunately, I have to create the population Z before extracting the sample. Z = X + Y , and only later I can draw the systematic sample. The Z is has only one column, of course. – 27titanik Apr 24 '16 at 15:09
  • The fact is that the 100 size sample yields perfect results in terms of mean and variance, comparing to the population. Probably it is not a problem of the sampling process, but it is due to the fact that the surface Z is really "simple", and that kind of sample covers entirely the variation of the variable Z. – 27titanik Apr 24 '16 at 18:26
  • @27titanik This makes sense. I guess this moves more into methodology at this point. Depending on your use of z, you could add some noise to it through some function of `rnorm`, `runif`, or friends. – lmo Apr 24 '16 at 20:17