-1

I have a data set named df_noyau_yes and I want to apply a ShuffleSplit to split it into train and test sets to train an autoencoder.

The problem is that this functions returns indices of the shuffled data, I tried to extract the data of these indices to feed them to the autoencoder but it dosen't work, it shows me an error KeyError 223

Here is the code:

rs = ShuffleSplit(n_splits=2, test_size=.25, random_state=0)
rs.get_n_splits(df_noyau_yes)

for train_index, test_index in rs.split(df_noyau_yes):
   print("TRAIN:", train_index, "TEST:", test_index)
   #X_train, X_test = df_noyau_yes[train_index], df_noyau_yes[test_index]
x_train=[]
for x in train_index:
    x_train = np.append(x_train, df_noyau_yes[x])
    print(x_train)

print("training set",x_train)

Is there any solution for that ??

Mari
  • 69
  • 1
  • 8

1 Answers1

0

For selection a dataframe values by index of rows and columns, iloc is used.

From the documentation:

The .iloc attribute is the primary access method. The following are valid inputs:

An integer e.g. 5
A list or array of integers [4, 3, 0]
A slice object with ints 1:7
A boolean array
A callable, see Selection By Callable

So you can simply provide your train_index, test_index to get the appropriate arrays.

x_train = df_noyau_yes.iloc[train_index].copy()
x_test = df_noyau_yes.iloc[test_index].copy()

I am using copy() here as an extra precaution. Because if you dont use copy(), and try to change a value in x_train or x_test, a warning is thrown.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132