4

I'm currently trying to perform a KFold on my pandas data frame that reads a pandas file from csv. Unfortunately i'm getting the error:

"None of [Int64Index , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n ...... dtype='int64')] are in the [columns]"

Here is my code:

def getSlicesOfData(read_csv):
    slice_training_data = read_csv[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8"]]
    slice_prediction_data = read_csv[["best_move"]]
    return (slice_training_data, slice_prediction_data)

def getKFold(data_sliced):
    kf = KFold(n_splits=10, random_state=None, shuffle=False)
    return kf.split(data_sliced[0],data_sliced[1])
    #return TimeSeriesSplit(n_splits=10, max_train_size=9)

if __name__ == "__main__":
    read_csv = pd.read_csv('100games.csv')
    data_slice = getSlicesOfData(read_csv)
    for train_index, test_index in getKFold(data_slice):
        x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
        y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]

what if anything am i doing wrong when attempting to get training data with:

x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
            y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]
Arkistarvh Kltzuonstev
  • 6,824
  • 7
  • 26
  • 56
plgent
  • 109
  • 2
  • 5
  • 2
    Possible duplicate of [KeyError: "None of \[\['', ''\]\] are in the \[columns\]" pandas python](https://stackoverflow.com/questions/51976930/keyerror-none-of-are-in-the-columns-pandas-python) – meW Mar 01 '19 at 17:24
  • i tried that already but i just get a similar error – plgent Mar 01 '19 at 17:33

4 Answers4

6

You're trying to perform K-fold on pandas data frame and that's where the problem lies. Try to change the data structure from pandas to numpy instead and re-run back the code. At the end, you might want to change back your data structure from numpy to pandas.

ssazally
  • 61
  • 1
  • 2
5

Convert to numpy using: data_slice[0].values[train_index]

Try:

if __name__ == "__main__":
    read_csv = pd.read_csv('100games.csv')
    data_slice = getSlicesOfData(read_csv)
    for train_index, test_index in getKFold(data_slice):
        x_train, x_test = data_slice[0].values[train_index], data_slice[0].values[test_index]
        y_train, y_test = data_slice[1].values[train_index], data_slice[1].values[test_index]

See also: https://stackoverflow.com/a/51091177/5025009

seralouk
  • 30,938
  • 9
  • 118
  • 133
2

Try iloc.

x_train, x_test = data_slice[0].iloc[train_index], data_slice[0].iloc[test_index]
y_train, y_test = data_slice[1].iloc[train_index], data_slice[1].iloc[test_index]
dixhom
  • 2,419
  • 4
  • 20
  • 36
-1

I mean... this seems like u have a fair idea of what you aim to accomplish but... with line such as [["best move"]]

perhapos calculate from 3 best moves and give a weighted chance for each to be selected and executed.

10 splits no random no shuffle...

like with 6 splits 1.5 random and a 2 shuffle it may perform better because... if your opponent has also taken these shortcuts but managed to get her running.\

in life and in circuitry, when you take the risk of going off the path a bit, your opponent expects you to use the typical strategies. Don't.

im no coding expert, but from the fundamentals i am aware of.. this just isnt quite enough. its a computer, you must be extremely explicit with your intructions

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 26 '23 at 23:25