2

Having a pandas dataframe of 4 rows of features, I create labels for them from "forecast_col" and shift them back to the past to make prediction later:

pandasdf['label'] = pandasdf[forecast_col].shift(-forecast_out)

Taking all the rows except the 'label' column:

X = np.array(pandasdf.drop(['label'], 1))

Normalizing data:

X = preprocessing.scale(X)

Taking last rows for future prediction:

X_lately = X[-forecast_out:]

Selecting data for training and cross-validation:

X = X[:-forecast_out]
y = np.array(pandasdf['label'])[:-forecast_out] 
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.3)

Training classifier:

clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)

Checking accuracy - it's around 95%: accuracy = clf.score(X_test, y_test)

Forecasting on the last data:

forecast_set = clf.predict(X_lately)

Here I should get the list of future prices for "forecast_out" periods, but I'm getting forecast for the same last data (X_lately) prices

Here's the example: forecasting the past

What am I doing wrong?

0 Answers0