Repeatedly subsample rows and preform function on subset

Question

I have a dataset like so:

set.seed(569)
dat<- data.frame(region=c(rep(1, 20), rep(2, 10)), loc= paste("plot", "_",seq(1,30,1)), 
      sp1= sample(0:3, 30, replace=T),sp2= sample(0:3, 30, 
      replace=T),sp3= sample(0:3, 30, replace=T),sp4= sample(0:3, 30, 
      replace=T),sp5= sample(0:3, 30, replace=T),sp6= sample(0:3, 30, 
      replace=T),sp7= sample(0:3, 30, replace=T),sp8= sample(0:3, 30, 
      replace=T),sp9= sample(0:3, 30, replace=T),sp10= sample(0:3, 
      30, replace=T))

Each row represents plot data within a region. I want to calculate diversity for each subset so that I may learn how variance in the number of plots contributes to variance in regional alpha diversity. This requires a loop I am uncertain of how to construct. First, the loop should subset by region and then for each region I want to RANDOMLY subsample x rows (plots) for a single region. Then, I will preform a calculation on each subset and store the output.

Each iteration for a regional subset should be x-i rows until x-(x/2) subsets have been sampled. Thus, I want to sample rows until I have subsampled half the rows within a region. Therefore the loop should be able to loop through smaller subsets of the data and preform a function.

For example, in region 1 there are 20 plots or unique levels of loc. In my first subsample I would randomly choose 19 plots and preform the function. In the second subsample I would randomly choose 18 plots and continue this process until I have subsampled 10 plots. For region 2 I would only do this for 5 plots. Since some regions have uneven # of plots there may need to be an if else statement to sample at least half if not more.

This loop should be repeated 1000 times so that each subset (x-i) has 1000 values.

Below are the functions I would like to run on each subset. Lets say I start with region 1 and randomly sample plot_1-plot_10.

 sub1<- dat[1:10,3:12]

1) First, calculate the sum of frequencies for each species within that subset:

sub1<- 
 sub1 %>%
 summarise_all(funs(sum))

2) to then, calculate diversity for that subset:

sub1 <- d(sub1, lev = "alpha",q=2)

This particular example would yield an alpha diversity of 5.929448. Values need to be stored in a data frame with two columns (region, diversity) so that I can disentangle variance by region.

What exactly is your question? You seem to have all the pieces figured out. — thc, Feb 14 '18 at 22:57
Since my data has multiple regions of varying plot size I wish to iterate through subsamples that are consecutively smaller and preform a function on those subsets. My question is out to create a loop that does this and then repeats 1000 times? Is that more clear? — Danielle, Feb 14 '18 at 23:04

Repeatedly subsample rows and preform function on subset

0 Answers0