I have a python pandas dataframe df
:
Group date Value
A 01-02-2016 16
A 01-03-2016 15
A 01-04-2016 14
A 01-05-2016 17
A 01-06-2016 19
A 01-07-2016 20
B 01-02-2016 16
B 01-03-2016 13
B 01-04-2016 13
C 01-02-2016 16
C 01-03-2016 16
I want to predict the value based on the date. I want to predict the value on 01-08-2016.
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
#I change the dates to be integers, I am not sure this is the best way
df['date'] = pd.to_datetime(df['date'])
df['date_delta'] = (df['date'] - df['date'].min()) / np.timedelta64(1,'D')
#Is this correct?
model = LinearRegression()
X = df[['date_delta']]
y = df.Value
model.fit(X, y)
model.score(X, y)
coefs = zip(model.coef_, X.columns)
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs)
I am not sure if I am treating the date correctly. Is there a better way?