2

I'm new to this machine learning and using this boston dataset for predictions. Everything except the result for precision_score and accuracy_score is working fine . This is what i have done :

import pandas as pd 
import sklearn 
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing,cross_validation, svm
from sklearn.datasets import load_boston
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

boston = load_boston()
df = pd.DataFrame(boston.data)
df.columns= boston.feature_names
df['Price']= boston.target

X = np.array(df.drop(['Price'],axis=1), dtype=np.float64)
X = preprocessing.scale(X)

y = np.array(df['Price'], dtype=np.float64)

print (len(X[:,6:7]),len(y))

X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.30)

clf =LinearRegression()
clf.fit(X_train,y_train)
y_predict = clf.predict(X_test)

print(y_predict,len(y_predict))
print (accuracy_score(y_test, y_predict))
print(precision_score(y_test, y_predict,average = 'macro'))

Now i get the following error:

File "LinearRegression.py", line 33, in

 accuracy = accuracy_score(y_test, y_predict)    File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py",

line 172, in accuracy_score

 y_type, y_true, y_pred = _check_targets(y_true, y_pred)

File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets

 raise ValueError("{0} is not supported".format(y_type))

 ValueError: continuous is not supported
piet.t
  • 11,718
  • 21
  • 43
  • 52
harshi
  • 343
  • 2
  • 4
  • 10

1 Answers1

5

You are using a linear Regression model as

clf = LinearRegression()

which predicts continuous values. eg: 1.2, 1.3

Whereas accuracy_score(y_test, y_predict) expects boolean values. 1 or 0 (true or false) or categorical values like 1,2,3,4 etc.. Where the numbers act as categories.

That's why you are getting an error.

How to solve this?

Since you are trying to predict Price on boston data which is a continuous value. I recommend you change your error measure from accuracy to RMSE or MSE

Replace:

print(accuracy_score(y_test, y_predict))

with:

from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, y_predict))

That will solve your problem.

Vikash Singh
  • 13,213
  • 8
  • 40
  • 70
  • but even when i change the classfier to Svm or RandomForestClassifier i get the same result . Do they predict in the same manner ? – harshi Feb 25 '17 at 09:13
  • @harshi RandomForestClassifier predicts 1's and 0's or categories like 0,1,2,3,4. Here you need discreet predictions for price which could be 1.2 or 0.24. They are continuous values. – Vikash Singh Feb 25 '17 at 09:17
  • 1
    @harshi read up more about the difference beyween classification problem and regression problem. It will help: https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning – Vikash Singh Feb 25 '17 at 09:19
  • I figured out i can use ' clf.score(X_test, y_test)' for accuracy .But calculating RMSE doesn't provide me with accuracy na . And how do i figure out the precision_score ? – harshi Feb 25 '17 at 09:26
  • @harshi precision_score is calculated for classification problems. This is a regression problem. – Vikash Singh Feb 25 '17 at 09:28
  • 1
    @harshi The clf.score() method will not get you accuracy. `LinearRegression` returns R-square coefficients in score. [See documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score). Your concepts about classification and regression are not clear I think. – Vivek Kumar Feb 25 '17 at 09:30
  • Ya . Got it .Thanks – harshi Feb 25 '17 at 09:36
  • See http://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics for evaluating performance of regression estimators. – Vivek Kumar Feb 25 '17 at 09:37