-1

I created this program in march and it worked fine then, but now it has an error and I can't figure out why. the error it shows how it looked when it was working

here is the current non working code (I coded this on Jupiter notebook)

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import seaborn
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
pd.options.mode.chained_assignment = None  # default='warn'


df = yf.download("spy")
df.to_csv('spy.csv')
df = df[['Adj Close']]
plt.plot(df)

df['Adj Close'].plot(figsize=(15,6), color = 'g')
plt.legend(loc='upper left')
plt.show()


forecast = 70
df['Prediction'] = df[['Adj Close']].shift(-forecast)
X = np.array(df.drop(['Prediction'], 1))
X = preprocessing.scale(X)            
X_forecast = X[-forecast:]
X = X[:-forecast]
y = np.array(df['Prediction'])
y = y[:-forecast]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = LinearRegression()
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
confidence
forecast_predicted = clf.predict(X_forecast)
print(forecast_predicted)

plt.plot(X, y)

dates = pd.date_range(start="2021-05-21", end= "2021-06-19")
plt.plot(dates, forecast_predicted, color='b')
df['Adj Close'].plot(color='g')
plt.xlim(xmin = datetime.date(2020,5,1))
plt.xlim(xmax = datetime.date(2021,7,1))

I know the error is in the last part of the code. here is how the last part of the code looked when it was working on march 15.

dates = pd.date_range(start="2021-03-16", end= "2021-04-14")
plt.plot(dates, forecast_predicted, color='b')
df['Adj Close'].plot(color='g')
plt.xlim(xmin = datetime.date(2020,3,1))
plt.xlim(xmax = datetime.date(2021,5,1))
Rubén
  • 34,714
  • 9
  • 70
  • 166
V N
  • 33
  • 8

1 Answers1

2

It is explained in the error output: yours x and y first dimensions don't match. The problem is you are forecasting for 70 days (forecast=70) and trying to plot that onto 30 days period.

You can either try changing forecast days:

forecast=30

Or the time period so it matches 70 days, something like this:

dates = pd.date_range(start="2021-05-21", end= "2021-07-29")
MySlav
  • 151
  • 5