Forecasting using Pandas OLS

Question

I have been using the scikits.statsmodels OLS predict function to forecast fitted data but would now like to shift to using Pandas.

The documentation refers to OLS as well as to a function called y_predict but I can't find any documentation on how to use it correctly.

By way of example:

exogenous = {
    "1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468","2011": "7154","2012": "4292","2013": "4283","2014": "4595","2015": "9194","2016": "4221","2017": "4520"}
endogenous = {
    "1998": "691", "1999": "1580", "2000": "80", "2001": "1450", "2002": "555", "2003": "956", "2004": "877", "2005": "614", "2006": "468", "2007": "191"}

import numpy as np
from pandas import *

ols_test = ols(y=Series(endogenous), x=Series(exogenous))

However, while I can produce a fit:

>>> ols_test.y_fitted
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

Prediction produces nothing different:

>>> ols_test.y_predict
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

In scikits.statsmodels one would do the following:

import scikits.statsmodels.api as sm
...
ols_model = sm.OLS(endogenous, np.column_stack(exogenous))
ols_results = ols_mod.fit()
ols_pred = ols_mod.predict(np.column_stack(exog_prediction_values))

How do I do this in Pandas to forecast the endogenous data out to the limits of the exogenous?

UPDATE: Thanks to Chang, the new version of Pandas (0.7.3) now has this functionality as standard.

hi, will you mind to give an example on how to use the ols.predict? say you have three independent variables,thus three betas[b1, b2, b3] now you want to use [x1, x2, x3] to predict a y — tesla1060, Mar 24 '13 at 12:33

score 2 · Accepted Answer · answered Apr 01 '12 at 16:37

2

is your issue how to get the predicted y values of your regression? Or is it how to use the regression coefficients to get predicted y values for a different set of samples for the exogenous variables? pandas y_predict and y_fitted should give you the same value and both should give you the same values as the predict method in scikits.statsmodels.

If you're looking for the regression coefficients, do ols_test.beta

answered Apr 01 '12 at 16:37

Chang She

16,692
8
40
25

I would like predicted y values for 2008 to 2017, which I can get with scikits.statsmodels predict, but I have no idea how to get it with Pandas. – Turukawa Apr 01 '12 at 18:54
Gotcha. If you want to use the pandas ols function, you can do (ols_result.beta['x'] * exog_2008_2017).sum() + ols_result.beta['intercept'] for now. – Chang She Apr 07 '12 at 20:40
I've opened a Github issue about it here: https://github.com/pydata/pandas/issues/1008 to provide a function that replicates the statsmodels functionality – Chang She Apr 07 '12 at 21:03
Longer term we plan to move the pandas OLS code (which has NA handling and moving window capability) into statsmodels and providing a consistent interface – Wes McKinney Apr 08 '12 at 16:33

Forecasting using Pandas OLS

1 Answers1

Linked