I have a data set that has 95 rows and 9 columns and want to do a 5-fold cross-validation. In the training, the first 8 columns (features) are used to predict the ninth column. My test sets are correct, but my x training set is of size (4,19,9) when it should have only 8 columns and my y training set is (4,9) when it should have 19 rows. Am I indexing the subarrays incorrectly?
kdata = data[0:95,:] # Need total rows to be divisible by 5, so ignore last 2 rows
np.random.shuffle(kdata) # Shuffle all rows
folds = np.array_split(kdata, k) # each fold is 19 rows x 9 columns
for i in range (k-1):
xtest = folds[i][:,0:7] # Set ith fold to be test
ytest = folds[i][:,8]
new_folds = np.delete(folds,i,0)
xtrain = new_folds[:][:][0:7] # training set is all folds, all rows x 8 cols
ytrain = new_folds[:][:][8] # training y is all folds, all rows x 1 col