I'm trying (badly) to use sklearn's LOO functionality and what I would like to do is append each training split set into a dataframe column with a label for the split index. So using the example from the sklearn page, but slightly modified:
import numpy as np
from sklearn.model_selection import LeaveOneOut
x = np.array([1,2])
y = np.array([3,4])
coords = np.column_stack((x,y))
z = np.array([8, 12])
loo = LeaveOneOut()
loo.get_n_splits(coords)
print(loo)
LeaveOneOut()
for train_index, test_index in loo.split(coords):
print("TRAIN:", train_index, "TEST:", test_index)
XY_train, XY_test = coords[train_index], coords[test_index]
z_train, z_test = z[train_index], z[test_index]
print(XY_train, XY_test, z_train, z_test)
Which returns:
TRAIN: [1] TEST: [0]
[[2 4]] [[1 3]] [12] [8]
TRAIN: [0] TEST: [1]
[[1 3]] [[2 4]] [8] [12]
In my case I'd like to write each split value to a dataframe like this:
X Y Ztrain Ztest split
0 1 2 8 12 0
1 3 4 8 12 0
2 1 2 12 8 1
3 3 4 12 8 1
And so on.
The motivation for doing this is I want to try a jackknifing interpolation of sparse point data. Ideally I want to run an interpolation/gridder on each of the LOO training sets, and then stack them. But I am struggling to access each train set to then use in something like griddata
Any help would be appreciated, for the problem here or the approach in general.