I am doing a Logistic Regression with the Elastic Net regularization method. I am trying to predict which variables are associated positively or negatively. An error is occurred after running the accuracy_score(y_true,y_pred), but i got an error: "ValueError: Found input variables with inconsistent numbers of samples: [9076, 9075]". Data frame has a size of 18151 obs. How can I fix the error? Could it be that when I do train_test_split at 50% I get an odd numbered subsample and an even numbered subsample?
X2=df.iloc[:,23:41]
y2=df["diab_inc"].values.reshape(-1,1)
X2_train,X2_test,y2_train,y2_test=train_test_split(X2,y2,test_size=0.5,random_state=1234)
print (len(X2_train),len(X2_test),len(y2_train),len(y2_test))
[9075 9076 9075 9076]
l1_ratio=(.001,.005,.01,.05,.1,.3,.5,.7,.9,1)
select=SelectFromModel(LogisticRegressionCV(cv=5, penalty='elasticnet', solver="saga", l1_ratios=l1_ratio, max_iter=10000)).fit(X2_train, y2_train)
print("Accuracy {0:2%}".format(accuracy_score(y2_test,select.estimator_.predict(X2_train))))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
----> 1 print("Accuracy {0:2%}".format(accuracy_score(y2_test,select.estimator_.predict(X2_train))))
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
200
201 # Compute accuracy for each possible representation
--> 202 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
203 check_consistent_length(y_true, y_pred, sample_weight)
204 if y_type.startswith('multilabel'):
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
81 y_pred : array or indicator matrix
82 """
---> 83 check_consistent_length(y_true, y_pred)
84 type_true = type_of_target(y_true)
85 type_pred = type_of_target(y_pred)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
317 uniques = np.unique(lengths)
318 if len(uniques) > 1:
--> 319 raise ValueError("Found input variables with inconsistent numbers of"
320 " samples: %r" % [int(l) for l in lengths])
321
ValueError: Found input variables with inconsistent numbers of samples: [9076, 9075]