0

I have a question regarding regression in python. To make a long story short, I need to find a model of form yt = mt + st where mt and st are trends and seasonal component respectively. In my earlier analysis, I have found that a good model for mt is a quadratic trend of type mt = a0 + a1*t + a2*t^2 through my regression analysis. Now, when I want to add the seasonal component, this is where I am having the hardest time. Now, I approached this two ways...one is through R programming where I am calling R objects into python and the other through python solely. Now, following the example in my book, I did the folliwng using R:

%load_ext rmagic
import rpy2.robjects as R
import pandas.rpy.common as com
from rpy2.robjects.packages import importr

stats = importr('stats')
r_df = com.convert_to_r_dataframe(pd.DataFrame(data.logTotal))
%Rpush r_df
%R ss = as.factor(rep(1:12,length(r_df$logTotal)/12))
%R tt = 1:length(r_df$logTotal)
%R tt2 = cbind(tt,tt^2)
%R ts_model = lm(r_df$logTotal ~ tt2+ss-1)
%R print(summary(ts_model))

I get the right regression coefficients. But, if i do the same thing in python, this is where I am getting problem replicating it.

import statsmodels.formula.api as smf
ss_temp=     pd.Categorical.from_array(np.repeat(np.arange(1,13),len(data.logTotal)/12))
dtemp = np.column_stack((t,t**2,data.logTotal))
dtemp = pd.DataFrame(dtemp,columns=['t','tsqr','logTotal'])
dtemp['ss'] = sstemp
res_result = smf.ols(formula='logTotal ~ t+tsqr + C(ss) -1',data=dtemp).fit()
res_result.params

What am i doing wrong here? I first get an error saying 'data type not found' which points to the res_result formula. So, then I tried changing ss_temp to a Series. Then, the above statements worked. However, my parameters were completely off when compared to the R output. I have been spending a day on this with no avail. Can someone please help me or guide me as to do or is there an python equivalent to as.factor in R? I assumed that was categorical in pandas.

Thanks

If the above is too hard, its fine. I still have the residual model from my regression in R. But, any ideas how to convert this to a python equivalent to what statsmodels interprets as a res from regression? thanks again

Rajan S.
  • 79
  • 1
  • 8
  • Try to define ss_temp as a numpy integer array. I don't know if the new `pd.Categorical` is already supported by patsy. – Josef Feb 06 '15 at 17:51
  • Your notation makes me wonder if you did this correctly in R. If you didn't use `poly`-function for the time variables, then you may have inappropriately inflated your statistical significance. See: http://stackoverflow.com/questions/13896578/polynomial-regression-nonsense-predictions and several postings on rhelp over the years where people naively use `I(var^2)` to attempt modeling of quadratic terms. – IRTFM Feb 06 '15 at 17:57
  • I believe I have used it correctly with R. its the python implementation that worries me. Even the graphical fit, doesn't match the actual time series. i.e the lines look mostly linear. but if i do this in R, it matches the original time series very well – Rajan S. Feb 06 '15 at 19:06

0 Answers0