KNN implementation in Python

Question

I am trying to implement a simple KNN technique in Python where I am using minute by minute stock price data, and using my x variables as Open, Close and Volume data to predict the next minute Open price. My code is as below:-

import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
from pylab import rcParams
import urllib
import sklearn
from sklearn.neighbors import KNeighborsRegressor
from sklearn import neighbors
from sklearn import preprocessing
from sklearn.cross_validation import train_test_split
from sklearn import metrics 
from googlefinance.client import get_price_data, get_prices_data, get_prices_time_data
import copy

np.set_printoptions(precision = 4, suppress = True)
rcParams['figure.figsize']=7,4
plt.style.use('seaborn-whitegrid')




param = {'q':"DJUSBK", 'i':"60",'x':"INDEXDJX",'p':"1Y"} # Dow Joes Banks
djusbk = get_price_data(param)
ticker_list=['ASB','BXS','BAC','BOH','BKU'] # 5 stocks from the Dow Jones Bank Index
ticker_dict = {}
for i in ticker_list :
    param = {'q':i, 'i':"60",'x':"NYSE",'p':"1Y"}
    df = get_price_data(param)
    x=i
    ticker_dict[x] = df

asb = copy.deepcopy(ticker_dict['ASB'])
asb_prime = pd.DataFrame(asb['Open'])
asb_prime['Close'] = asb['Close']
asb_prime['Volume'] = asb['Volume']

asb_prime_copy = copy.deepcopy(asb_prime)


# Splitting your data into test and training data sets
X_prime = asb_prime_copy.ix[:,(0,1,2)].values
asb_open_next = pd.DataFrame(copy.deepcopy(asb['Open']))
asb_open_next.drop(asb_open_next.index[:1], inplace=True)

asb_prime_copy= asb_prime_copy[:-1]

X_prime = asb_prime_copy.ix[:,(0,1,2)].values
y = asb_open_next.ix[:,(0)].values

X = preprocessing.scale(X_prime)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5,random_state = 17)

#Building and Training Model  with Training Data
clf = neighbors.KNeighborsRegressor()
clf.fit(X_train,y_train)
print(clf)

# Evaluating the model's predictions against the test dataset
y_expect=y_test

y_pred= clf.predict(X_test)
print(metrics.classification_report(y_expect,y_pred))

At the very end I am getting en error. Not sure why? I am using Python 3.x

  File "C:\Users\gg\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 97, in unique_labels
    raise ValueError("Unknown label type: %s" % repr(ys))

ValueError: Unknown label type: (array([ 28.2  ,  28.375,  28.325, ...,  28.075,  28.275,  28.1  ]), array([ 28.23 ,  28.4  ,  28.32 , ...,  28.055,  28.28 ,  28.08 ]))

As suggested in the below answer KNeighborsClassifier() was updated with KNeighborsRegressor() and that had solved the previous issue

Possible duplicate of [LogisticRegression: Unknown label type: 'continuous' using sklearn in python](https://stackoverflow.com/questions/41925157/logisticregression-unknown-label-type-continuous-using-sklearn-in-python) — Nouman Riaz Khan, Jun 25 '18 at 12:55
You have *accepted* an answer; what exactly you mean by "previous issue"?? Is there any issue still pending?? Practically, accepted answer = case closed, except possibly for *minor* adjustments/clarifications... I strongly suggest you open a new question. — desertnaut, Jun 25 '18 at 14:05

score 5 · Accepted Answer · answered Jun 25 '18 at 12:59

5

You are dealing with a regression problem: predicting a price. So switching from KNeighborsClassifier to KNeighborsRegressor will solve this issue.

answered Jun 25 '18 at 12:59

Jan K

4,040
1
15
16

In that case how should I ammend this, the last line? I have updated the last 3 lines of the code – vicky113 Jun 25 '18 at 13:08
1

You cannot generate a classification report for a regression problem – Jan K Jun 25 '18 at 13:25
Yes, you need to amend the last line, as classification metrics are not useful for a regression problem. Check what regression metrics suit your problem and either take one of the [provided](http://scikit-learn.org/stable/modules/classes.html#regression-metrics) or calculate your own as needed. – Marcus V. Jun 25 '18 at 13:25
5

@DebdiptaMajumdar updating the code *after* an answer has been provided (and accepted!) is **not** good practice, as it makes the answer seem irrelevant. You should either edit your post and *indicate so*, or open a new question... – desertnaut Jun 25 '18 at 13:28

KNN implementation in Python

1 Answers1