lets say I have a list of dataframes different numbers of rows:
AB_df = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
AB_pc = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")
I want to split the nested dataframes into n equal, but random pieces so that I have e.g. for 4 pieces 4 dataframes with 50 rows and 1 with 51 rows. No row should be twice in any od the splitted dataframes.
The structure should be:
List of 8
$ AB_df: list of 4
$ AB_df1: "data.frame": 50 obs. of 2 variables
..$ X1: int [1:50] 88 128....
..$ X2: int [1:50] 12 84 ....
$ AB_df2: "data.frame": 50 obs. of 2 variables
..$ X1: int [1:50] numbers...
..$ X2: int [1:50] numbers....
$ AB_df3: "data.frame": 50 obs. of 2 variables
..$ X1: int [1:50] numbers...
..$ X2: int [1:50] numbers....
$ AB_df4: "data.frame": 51 obs. of 2 variables
..$ X1: int [1:50] numbers...
..$ X2: int [1:50] numbers....
$ BC_df:'list of 4
$ BC_df1: "data.frame": 50 obs. of 2 variables
..$ X1: int [1:50] numbers...
..$ X2: int [1:50] numbers....
$ BC_df2: "data.frame": 50 obs. of 2 variables
..$ X1: int [1:50] numbers...
..$ X2: int [1:50] numbers....
............................
I found several topics on how to split a dataframe randomly, but non of these topics helped me with my problem.
UPDATE: This only gives me 3 splitted dataframes for some reason.
set.seed(0L)
AB_df = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))
AB_pc = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))
df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")
new = lapply(df_list, function(df) {
n <- nrow(df)
split(df, cut(sample(n), seq(1, n, by=floor(n/4)), labels=FALSE, include.lowest=TRUE))})