I have a large dataset of 23k rows. That data looks like something below:
import pandas as pd
d = {'Date': ["1-1-2020", '1-1-2020', "1-2-2020", "1-2-2020"], 'Stock': ["FB", "F", "FB", "F"],
"last_price": [230,8,241,9], "price":[241,9,240,8.5]}
df = pd.DataFrame(data=d)
Date Stock_id last_price price
0 1-1-2020 5 230 241.0
1 1-1-2020 41 8 9.0
2 1-2-2020 5 241 240.0
3 1-2-2020 41 9 8.5
Note that data includes many stocks on many different dates. How can I create a model that uses the feature for example last_price and stock id to predict next-day price? And that uses the old data to re-train the data.
Now, this was the best thing I could do. I used LinearRegression but any other model advice can work.
X = df[['Stock_id', 'last_price']]
y = df[['price']]
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn import linear_model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
lm = linear_model.LinearRegression()
lm.fit(X_train, y_train)
y_pred = lm.predict(X_test)
result = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
Index Actual Predicted
487 45 32
4154 420 512
Is there a way where the model is trained on the first 3000 rows? Then the model makes a prediction for say date 12-11-2020 and then adds 12-11-2020 info to make the prediction for 12-12-2020 and so on?
I was hoping to get something like this.
Date Actual Predicted
12-11-2020 45 32
12-11-2020 420 512
12-12-2020 43 34
12-12-2020 423 513