I have a question regarding regression in python. To make a long story short, I need to find a model of form yt = mt + st where mt and st are trends and seasonal component respectively. In my earlier analysis, I have found that a good model for mt is a quadratic trend of type mt = a0 + a1*t + a2*t^2 through my regression analysis. Now, when I want to add the seasonal component, this is where I am having the hardest time. Now, I approached this two ways...one is through R programming where I am calling R objects into python and the other through python solely. Now, following the example in my book, I did the folliwng using R:
%load_ext rmagic
import rpy2.robjects as R
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
stats = importr('stats')
r_df = com.convert_to_r_dataframe(pd.DataFrame(data.logTotal))
%Rpush r_df
%R ss = as.factor(rep(1:12,length(r_df$logTotal)/12))
%R tt = 1:length(r_df$logTotal)
%R tt2 = cbind(tt,tt^2)
%R ts_model = lm(r_df$logTotal ~ tt2+ss-1)
%R print(summary(ts_model))
I get the right regression coefficients. But, if i do the same thing in python, this is where I am getting problem replicating it.
import statsmodels.formula.api as smf
ss_temp= pd.Categorical.from_array(np.repeat(np.arange(1,13),len(data.logTotal)/12))
dtemp = np.column_stack((t,t**2,data.logTotal))
dtemp = pd.DataFrame(dtemp,columns=['t','tsqr','logTotal'])
dtemp['ss'] = sstemp
res_result = smf.ols(formula='logTotal ~ t+tsqr + C(ss) -1',data=dtemp).fit()
res_result.params
What am i doing wrong here? I first get an error saying 'data type not found' which points to the res_result formula. So, then I tried changing ss_temp to a Series. Then, the above statements worked. However, my parameters were completely off when compared to the R output. I have been spending a day on this with no avail. Can someone please help me or guide me as to do or is there an python equivalent to as.factor in R? I assumed that was categorical in pandas.
Thanks
If the above is too hard, its fine. I still have the residual model from my regression in R. But, any ideas how to convert this to a python equivalent to what statsmodels interprets as a res from regression? thanks again