I am trying to evaluate logistic regression using the AUROC curve and and cross-validate my scores. When I don't cross-validate I have no issues, but I really want to use cross validation to help decrease bias in my method.
Anyway, below is the code and error term I get for the beginning part of my code:
X = df.drop('Survived', axis=1)
y = df['Survived']
skf = StratifiedKFold(n_splits=5)
logmodel = LogisticRegression()
i=0
for train, test in skf.split(X,y):
logmodel.fit(X[train], y[train]) # error occurs here
predictions = logmodel.predict_proba(X[test])
# a bunch of code that I haven't included which creates the ROC curve
i += 1
The error occurs in the fourth to last line, and returns a list of integers followed by 'not in index'
I don't really understand what the problem is?
This is my understanding of the code: First I create an instance of both stratified kfold and logistic regression. The instance of stratified kfold states that five folds are to be made. Next, I say that for each train and test fold in my dataset X, y I fit the logistic model to the data and then create a list of predictions for different probabilities based on the test data. Later (this part is not showed) I will create a ROC curve for each k-fold of data.
Again, I don't really understand what the problem is but maybe somebody can clarify. My work is more or less copied directly from this link in sklearn: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-auto-examples-model-selection-plot-roc-crossval-py