I'm a data science noob and am working on the Kaggle Titanic dataset. I'm running a Logistic Regression on it to predict whether passengers in the test data set survived or died.
I clean both the training and test data and run the Logistic Regression fit on the training data. All good.
train = pd.read_csv('train.csv')
X_train = train.drop('Survived',axis=1)
y_train = train['Survived']
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
Then I run the prediction model on the test data as such:
test = pd.read_csv('test.csv')
predictions = logmodel.predict(test)
I then try to print the Confusion Matrix:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(test,predictions))
I get an error that says:
ValueError: Classification metrics can't handle a mix of continuous-multioutput and binary targets
What does this mean and how do I fix it?
Some potential issues I see are:
- I'm doing something super dumb and wrong with that prediction model on the test data.
- The value for features "Age" and "Fare" (cost of passenger's ticket) are floats, while the rest are integers.
Where am I going wrong? Thanks for your help!