I was using train_test_split in my code and then wanted to change it to cross validation, but something strange is hapenning.
train, test = train_test_split(data, test_size=0)
x_train = train.drop('CRO', axis=1)
y_train = train['CRO']
scaler = MinMaxScaler(feature_range=(0, 1))
x_train_scaled = scaler.fit_transform(x_train)
x_train = pd.DataFrame(x_train_scaled)
for k in range(1, 5):
knn = neighbors.KNeighborsRegressor(n_neighbors=k, weights='uniform')
scores = model_selection.cross_val_score(knn, x_train, y_train, cv=5)
print(scores.mean(), 'score for k = ', k)
This code gives the scores around 0.8, but when I delete that first line and change the 'train' set for the 'data' set in the 2nd and 3rd lines, the score changes to 0.2, which is strange because I even set the test_size to 0 so the train should be equal to the whole data. What is hapenning?