This is a basic implementation of Gaussian Bayes using sklearn. Can anyone tell me what I'm doing wrong here, my K-Fold CV results are a bit weird:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, precision_score, classification_report
import csv
from sklearn.model_selection import cross_val_score
column_names = ['AS', 'fh', 'class2']
df = pd.read_csv("C:/Users/Jans/Music/docx/222/test.csv", sep=';', header = 0, names = column_names)
x = df.drop(['AS', 'class2'], axis=1)
df['class2'] = df['class2'].astype(int)
y = df['class2'].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, shuffle = False, random_state = None)
model = GaussianNB()
model.fit(x_train, y_train.astype('int'))
k_fold_acc = cross_val_score(model, x_train, y_train, cv=10)
k_fold_mean = k_fold_acc.mean()
for i in k_fold_acc:
print(i)
print("accuracy K Fold CV:" + str(k_fold_mean))
grid_predictions = model.predict(x_test)
my 10 Fold CV results (especially the first fold is very strange...):
0.36714285714285716
0.8271428571428572
0.9785714285714285
0.9357142857142857
0.9628571428571429
0.9957081545064378
1.0
1.0
0.994277539341917
0.9842632331902719
accuracy K Fold CV:0.90456774984672
Also, when I increase my test set from suppose 0.2 to 0.6 these are the results, which is also a bit strange.
Am I doing something wrong? And if yes, what?
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
accuracy K Fold CV:1.0