0

I've been searching google and can't figure out what I'm doing wrong. I'm pretty new to python and trying to use scikit on stocks but I'm getting the error "ValueError: matrices are not aligned" when trying to predict.

import datetime

import numpy as np
import pylab as pl
from matplotlib import finance
from matplotlib.collections import LineCollection

from sklearn import cluster, covariance, manifold, linear_model

from sklearn import datasets, linear_model

###############################################################################
# Retrieve the data from Internet

# Choose a time period reasonnably calm (not too long ago so that we get
# high-tech firms, and before the 2008 crash)
d1 = datetime.datetime(2003, 01, 01)
d2 = datetime.datetime(2008, 01, 01)

# kraft symbol has now changed from KFT to MDLZ in yahoo
symbol_dict = {
    'AMZN': 'Amazon'}

symbols, names = np.array(symbol_dict.items()).T

quotes = [finance.quotes_historical_yahoo(symbol, d1, d2, asobject=True)
          for symbol in symbols]

open = np.array([q.open for q in quotes]).astype(np.float)
close = np.array([q.close for q in quotes]).astype(np.float)

# The daily variations of the quotes are what carry most information
variation = close - open

#########

pl.plot(range(0, len(close[0])-20), close[0][:-20], color='black')

model = linear_model.LinearRegression(normalize=True)
model.fit([close[0][:-1]], [close[0][1:]])

print(close[0][-20:])
model.predict(close[0][-20:])


#pl.plot(range(0, 20), model.predict(close[0][-20:]), color='red')

pl.show()

The error line is

model.predict(close[0][-20:])

I've tried nesting it in a list. Making it an array with numpy. Anything I could find on google but I have no idea what I'm doing here.

What does this error mean and why is it happening?

micah
  • 7,596
  • 10
  • 49
  • 90
  • You may also just add a constant feature parameter to your data using something like: X.add_constant(len(X)) – Union find Sep 01 '14 at 17:14

1 Answers1

2

Trying to predict stock price by simple linear regression? :^|. Anyway, this is what you need to change:

In [19]:

M=model.fit(close[0][:-1].reshape(-1,1), close[0][1:].reshape(-1,1))
In [31]:

M.predict(close[0][-20:].reshape(-1,1))
Out[31]:
array([[ 90.92224274],
       [ 94.41875811],
       [ 93.19997275],
       [ 94.21895723],
       [ 94.31885767],
       [ 93.030142  ],
       [ 90.76240203],
       [ 91.29187436],
       [ 92.41075928],
       [ 89.0940647 ],
       [ 85.10803717],
       [ 86.90624508],
       [ 89.39376602],
       [ 90.59257129],
       [ 91.27189427],
       [ 91.02214318],
       [ 92.86031126],
       [ 94.25891741],
       [ 94.45871828],
       [ 92.65052033]])

Remember, when you build a model, X and y for .fit method should have the shape of [n_samples,n_features]. The same applies to the .predict method.

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • I'm not sure what to use. What would you recommend? I don't even know what LinearRegression means I'm just trying to follow the examples with my own data. – micah Feb 21 '14 at 17:45