I am try to predict the stock price of IBM. but i have gottchas on handling the date column field for model training in a linear regression algorithm. this is how my dataset looks like:
Date Open High Low Close Adj Close Volume
0 1962-01-02 7.713333 7.713333 7.626667 7.626667 0.618153 387200
1 1962-01-03 7.626667 7.693333 7.626667 7.693333 0.623556 288000
2 1962-01-04 7.693333 7.693333 7.613333 7.616667 0.617343 256000
3 1962-01-05 7.606667 7.606667 7.453333 7.466667 0.605185 363200
4 1962-01-08 7.460000 7.460000 7.266667 7.326667 0.593837 544000
my code is:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplit
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
df = pd.read_csv('IBM.csv')
df['Date'] = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)
X = df.drop('Adj Close', axis='columns')
Y = df['Adj Close']
scaler = MinMaxScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)
timesplit= TimeSeriesSplit(n_splits=10)
for train_index, test_index in timesplit.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]
I got an error:
KeyError: "None of [Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n ...\n 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332],\n dtype='int64', length=1333)]
are in the [columns]"
even when i managed to get it to work am unable to train my model.