This post has been editted to more accurately describe the situation. I am utilising a form of jackknife sampling for my work. The jackknifed data will be used for calibration of a model, and the unused data will be used for validation.
Rather than perform the analysis immediately, I want to save the jackknifed samples as dataframes, as well as the data which was removed for each sample...
It's hard to explain, so I will use an example to illustrate:
The aim in the example is to create the datasets 4 times. Each time there should be 2 datasets - 1 of length 9 (the calibration one), and 1 of length 3 (the validation one).
df <-
data.frame(value1 = 1:(3*4),
value2 = seq(from = 1000, by = 50, length.out = 3*4),
tosplit = rep(1:4, each = 3))
df #df represents the dataframe in its entirety
dfs <- split(df, df$tosplit) #df is now split into 4 equal parts of 3
#####
> #Replicate 1
> r1_3parts <- do.call("rbind", dfs[1:3])
> r1_1parts <- do.call("rbind", dfs[4])
>
> r1_3parts
value1 value2 tosplit
1.1 1 1000 1
1.2 2 1050 1
1.3 3 1100 1
2.4 4 1150 2
2.5 5 1200 2
2.6 6 1250 2
3.7 7 1300 3
3.8 8 1350 3
3.9 9 1400 3
> r1_1parts
value1 value2 tosplit
4.10 10 1450 4
4.11 11 1500 4
4.12 12 1550 4
>
> #Replicate 2
> r2_3parts <- do.call("rbind", dfs[2:4])
> r2_1parts <- do.call("rbind", dfs[1])
>
> r2_3parts
value1 value2 tosplit
2.4 4 1150 2
2.5 5 1200 2
2.6 6 1250 2
3.7 7 1300 3
3.8 8 1350 3
3.9 9 1400 3
4.10 10 1450 4
4.11 11 1500 4
4.12 12 1550 4
> r2_1parts
value1 value2 tosplit
1.1 1 1000 1
1.2 2 1050 1
1.3 3 1100 1
>
> #Replicate 3
> r3_3parts <- do.call("rbind", dfs[c(3:4, 1)])
> r3_1parts <- do.call("rbind", dfs[2])
>
> r3_3parts
value1 value2 tosplit
3.7 7 1300 3
3.8 8 1350 3
3.9 9 1400 3
4.10 10 1450 4
4.11 11 1500 4
4.12 12 1550 4
1.1 1 1000 1
1.2 2 1050 1
1.3 3 1100 1
> r3_1parts
value1 value2 tosplit
2.4 4 1150 2
2.5 5 1200 2
2.6 6 1250 2
>
>
> #Replicate 4
> r4_3parts <- do.call("rbind", dfs[c(4, 1:2)])
> r4_1parts <- do.call("rbind", dfs[3])
>
> r4_3parts
value1 value2 tosplit
4.10 10 1450 4
4.11 11 1500 4
4.12 12 1550 4
1.1 1 1000 1
1.2 2 1050 1
1.3 3 1100 1
2.4 4 1150 2
2.5 5 1200 2
2.6 6 1250 2
> r4_1parts
value1 value2 tosplit
3.7 7 1300 3
3.8 8 1350 3
3.9 9 1400 3
>
This doesn't appear to be an option in packages that I can find - they default to just creating the statistics for you. What I want is to see the sample datasets, and also specify their relative size. Is this possible in an existing package, or if not, is there a suitable way to determine this in a more automated fashion?