I have a dataset like so:
set.seed(569)
dat<- data.frame(region=c(rep(1, 20), rep(2, 10)), loc= paste("plot", "_",seq(1,30,1)),
sp1= sample(0:3, 30, replace=T),sp2= sample(0:3, 30,
replace=T),sp3= sample(0:3, 30, replace=T),sp4= sample(0:3, 30,
replace=T),sp5= sample(0:3, 30, replace=T),sp6= sample(0:3, 30,
replace=T),sp7= sample(0:3, 30, replace=T),sp8= sample(0:3, 30,
replace=T),sp9= sample(0:3, 30, replace=T),sp10= sample(0:3,
30, replace=T))
Each row represents plot data within a region. I want to calculate diversity for each subset so that I may learn how variance in the number of plots contributes to variance in regional alpha diversity. This requires a loop I am uncertain of how to construct. First, the loop should subset by region and then for each region I want to RANDOMLY subsample x rows (plots) for a single region. Then, I will preform a calculation on each subset and store the output.
Each iteration for a regional subset should be x-i rows until x-(x/2) subsets have been sampled. Thus, I want to sample rows until I have subsampled half the rows within a region. Therefore the loop should be able to loop through smaller subsets of the data and preform a function.
For example, in region 1
there are 20 plots or unique levels of loc
. In my first subsample I would randomly choose 19 plots and preform the function. In the second subsample I would randomly choose 18 plots and continue this process until I have subsampled 10 plots. For region 2 I would only do this for 5 plots. Since some regions have uneven # of plots there may need to be an if else statement to sample at least half if not more.
This loop should be repeated 1000 times so that each subset (x-i) has 1000 values.
Below are the functions I would like to run on each subset. Lets say I start with region 1 and randomly sample plot_1-plot_10.
sub1<- dat[1:10,3:12]
1) First, calculate the sum of frequencies for each species within that subset:
sub1<-
sub1 %>%
summarise_all(funs(sum))
2) to then, calculate diversity for that subset:
sub1 <- d(sub1, lev = "alpha",q=2)
This particular example would yield an alpha diversity of 5.929448. Values need to be stored in a data frame with two columns (region, diversity) so that I can disentangle variance by region.