Not sure why do I get an error when doing k-Fold Cross validation

Question

I'm learning neural networks and I copied a code example but I'm not sure why do I get an error. Here is my code

df = pd.read_csv('games.csv')
df =df.dropna()
X = df[['Goals', 'Saves', 'Wins', 'Games']]
Y = df['Shots']
seed = 7
numpy.random.seed(seed)
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
    model = Sequential()
    model.add(Dense(16, input_dim=4, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

The error I get is

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,\n            ...\n            3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096, 3097],\n           dtype='int64', length=2788)] are in the [columns]"

Example of X because df is too big with over a hundred columns

Saves Games Wins Goals 82.60 0.765 0.51140 0.5492
86.40 0.56100 0.4902 0.71860
75.60 0.45620 0.5152 0.87820
87.00 0.52400 0.5266 0.63940
82.40 0.51180 0.5176 0.74600
80.80 0.50380 0.4976 0.79380
87.00 0.54580 0.4934 0.81160
80.25 0.46050 0.5070 0.72550
88.80 0.48180 0.5130 0.63440
78.20 0.49500 0.4920 0.75160
81.60 0.50640 0.4700 0.77280
80.60 0.49520 0.5546 0.79960
83.60 0.46060 0.5070 0.74940
83.40 0.45920 0.4428 0.75200
84.40 0.51420 0.5026 0.72400
80.40 0.50260 0.4554 0.73640
83.00 0.49375 0.4475 0.74725
79.80 0.47880 0.4898 0.78160

Can you add a snippet of your dataframe using df.head(20).to_clipboard('',index=True) - Might give a little more insight into the error. — JamesArthur, Aug 17 '21 at 14:42
I wonder if you're getting the error because the X contains floats (numbers with decimals) but the model is looking for int64 (integers without decimals) as noted in the error. — JamesArthur, Aug 17 '21 at 18:31

score 0 · Answer 1 · answered Aug 18 '21 at 07:41

0

The problem was with model.fit(X[train], Y[train] where it had to be model.fit(X.iloc[train], Y.iloc[train]

answered Aug 18 '21 at 07:41

Jerry

85
13

Not sure why do I get an error when doing k-Fold Cross validation

1 Answers1