Having a pandas dataframe of 4 rows of features, I create labels for them from "forecast_col" and shift them back to the past to make prediction later:
pandasdf['label'] = pandasdf[forecast_col].shift(-forecast_out)
Taking all the rows except the 'label' column:
X = np.array(pandasdf.drop(['label'], 1))
Normalizing data:
X = preprocessing.scale(X)
Taking last rows for future prediction:
X_lately = X[-forecast_out:]
Selecting data for training and cross-validation:
X = X[:-forecast_out]
y = np.array(pandasdf['label'])[:-forecast_out]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.3)
Training classifier:
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
Checking accuracy - it's around 95%: accuracy = clf.score(X_test, y_test)
Forecasting on the last data:
forecast_set = clf.predict(X_lately)
Here I should get the list of future prices for "forecast_out" periods, but I'm getting forecast for the same last data (X_lately) prices
Here's the example: forecasting the past
What am I doing wrong?