Write sklearn LOO splits to pandas dataframe with index as label column

Question

I'm trying (badly) to use sklearn's LOO functionality and what I would like to do is append each training split set into a dataframe column with a label for the split index. So using the example from the sklearn page, but slightly modified:

import numpy as np
from sklearn.model_selection import LeaveOneOut

x = np.array([1,2])
y = np.array([3,4])
coords = np.column_stack((x,y))
z = np.array([8, 12])
loo = LeaveOneOut()
loo.get_n_splits(coords)

print(loo)
LeaveOneOut()
for train_index, test_index in loo.split(coords):
     print("TRAIN:", train_index, "TEST:", test_index)
     XY_train, XY_test = coords[train_index], coords[test_index]
     z_train, z_test = z[train_index], z[test_index]
     print(XY_train, XY_test, z_train, z_test)

Which returns:

TRAIN: [1] TEST: [0]
[[2 4]] [[1 3]] [12] [8]
TRAIN: [0] TEST: [1]
[[1 3]] [[2 4]] [8] [12]

In my case I'd like to write each split value to a dataframe like this:

     X    Y   Ztrain    Ztest    split
0    1    2   8         12       0
1    3    4   8         12       0
2    1    2   12        8        1
3    3    4   12        8        1

And so on.

The motivation for doing this is I want to try a jackknifing interpolation of sparse point data. Ideally I want to run an interpolation/gridder on each of the LOO training sets, and then stack them. But I am struggling to access each train set to then use in something like griddata

Any help would be appreciated, for the problem here or the approach in general.

It looks like the split is correct, you just need to stack them to have the expected dataframe. — Quang Hoang, Dec 10 '21 at 15:44
Thanks @QuangHoang - sorry to be a pain but how would I go about doing that? — 8556732, Dec 10 '21 at 15:58

score 1 · Answer 1 · answered Dec 10 '21 at 23:22

I don't quite get the logic of your dataframe, but you can try something like below to get your dataframe:

df = []
for train_index, test_index in loo.split(coords):
    x = pd.DataFrame({'XY_train':coords[train_index][0],\
    'XY_test':coords[test_index][0],\
    'Ztrain':z[train_index][0],\
    'Ztest':z[test_index][0]})
    df.append(x)
df = pd.concat(df)
df

   XY_train  XY_test  Ztrain  Ztest
0         2        1      12      8
1         4        3      12      8
0         1        2       8     12
1         3        4       8     12

Write sklearn LOO splits to pandas dataframe with index as label column

1 Answers1