ShuffleSplit of Sklearn issue

Question

I have a data set named df_noyau_yes and I want to apply a ShuffleSplit to split it into train and test sets to train an autoencoder.

The problem is that this functions returns indices of the shuffled data, I tried to extract the data of these indices to feed them to the autoencoder but it dosen't work, it shows me an error KeyError 223

Here is the code:

rs = ShuffleSplit(n_splits=2, test_size=.25, random_state=0)
rs.get_n_splits(df_noyau_yes)

for train_index, test_index in rs.split(df_noyau_yes):
   print("TRAIN:", train_index, "TEST:", test_index)
   #X_train, X_test = df_noyau_yes[train_index], df_noyau_yes[test_index]
x_train=[]
for x in train_index:
    x_train = np.append(x_train, df_noyau_yes[x])
    print(x_train)

print("training set",x_train)

Is there any solution for that ??

Is df_noyau_yes a pandas Dataframe? If yes, then you cannot access the samples by simple indexing as you are doing here. — Vivek Kumar, Apr 17 '18 at 13:26

score 0 · Answer 1 · answered Apr 18 '18 at 06:07

For selection a dataframe values by index of rows and columns, iloc is used.

From the documentation:

The .iloc attribute is the primary access method. The following are valid inputs:
An integer e.g. 5
A list or array of integers [4, 3, 0]
A slice object with ints 1:7
A boolean array
A callable, see Selection By Callable

So you can simply provide your train_index, test_index to get the appropriate arrays.

x_train = df_noyau_yes.iloc[train_index].copy()
x_test = df_noyau_yes.iloc[test_index].copy()

I am using copy() here as an extra precaution. Because if you dont use copy(), and try to change a value in x_train or x_test, a warning is thrown.

I know 'iloc' method but I need to try something that shuffle the data: something dynamic not static. — Mari, Apr 19 '18 at 09:00
@Mari Thats what the ShuffleSplit is doing here. Please explain your use case in detail. — Vivek Kumar, Apr 19 '18 at 09:14

ShuffleSplit of Sklearn issue

1 Answers1