0

I am trying to create one machine learning model using Kernel ridge regression with k-fold but I am getting the below error. Much appreciate for your informations-

datasetTrain = pd.read_csv('D:/set_AB.csv')
datasetTest = pd.read_csv('D:/set_C.csv')

X = datasetTrain
y = datasetTest

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

I am getting the following error----

ValueError: Found input variables with inconsistent numbers of samples: [140, 70]

enter code here

NNN751
  • 1

1 Answers1

0

You are doing train_test_split that is fundamentally wrong.

Error: len(datasetTrain)=140, len(datasetTest)=70. Therefore dimension mismatch.

datasetTrain is the dataset used for training. The train dataset in your case should contain:

Xtrain: The input or the predictor variables that impact the target variable .

ytrain: The target variable.

datasetTest is the dataset used for test. The test dataset in your case should contain:

Xtest: The input or the predictor variables that impact the target variable.

ytest: The target variable.

Concatenate both the datasetTrain and datasetTest along dim=0.

final_df = pd.concat([datasetTrain, datasetTest])
X = final_df["only those columns which are input variables"]
y = final_df["target_variable"]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
"""
Creation of model and the training code goes here

"""
Priya
  • 723
  • 1
  • 5
  • 7