0

Do you know how i can fit the first part into rest of the function to get a regression analysis.

import numpy as np
import matplotlib.pyplot as plt
from numpy import *
from matplotlib.pyplot import * #for the graph

data = np.genfromtxt('bookingdata.csv', delimiter = ',')

tvst = data[1:,][:,1]
cntime = data[1:,][:,2] 
brate = data[1:,][:,3]
ppvwst = data[1:,][:,4] 



import statsmodels.api as sm


def reg_m(tvst, cntime,brate,ppvwst):
ones = np.ones(len(cntime[0])  
X = sm.add_constant(np.column_stack((x[0], ones)))
for ele in x[1:]:
X = sm.add_constant(np.column_stack((ele, X)))
results = sm.OLS(y, X).fit()
return results

and all the variables have been loaded as variables tvst,cntime etc. Also all the variable are numbers.

the end goal is now to get a multivariate regression like

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.535
Model:                            OLS   Adj. R-squared:                  0.461
Method:                 Least Squares   F-statistic:                     7.281
Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191
Time:                        21:51:28   Log-Likelihood:                -26.025
No. Observations:                  23   AIC:                             60.05
Df Residuals:                      19   BIC:                             64.59
Df Model:                           3            
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.2424      0.139      1.739      0.098        -0.049     0.534
x2             0.2360      0.149      1.587      0.129        -0.075     0.547
x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241
const          1.5704      0.633      2.481      0.023         0.245     2.895


Omnibus:                        6.904   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
Skew:                          -0.849   Prob(JB):                       0.0950
Kurtosis:                       4.426   Cond. No.                         38.6
hxalchemy
  • 366
  • 1
  • 10
  • What does not seem to work? – DYZ Mar 10 '17 at 03:09
  • i loaded my csv file into python, as variables tvst(dependent) and centime,brate,ppvwst(independent). Now i want to do do multivariate regression test on the model, with such output like the one shown above. I am completely lost on how i can can achieve that. – hxalchemy Mar 10 '17 at 03:26
  • `add_constant` does the same thing as adding the column of ones. Either one is redundant. – Josef Mar 10 '17 at 16:09

2 Answers2

1

First, If y (endog) is just one variable, then this is called multiple regression. Multivariate regression usually refers to the case when we have several y at the same time, i.e. y is multivariate.

add_constant does the same thing as adding the column of ones. Either one is redundant.

So, the multiple regression is just

X = sm.add_constant(np.column_stack((cntime, brate, ppvwst)))
results = sm.OLS(y, X).fit()

or given that we already do a column_stack which avoids an extra copy of the data:

X = np.column_stack((ones, cntime, brate, ppvwst)))
results = sm.OLS(y, X).fit()
Josef
  • 21,998
  • 3
  • 54
  • 67
0

To produce the output table, it appears that we can do:

results.summary()

In the function you wrote, you could maybe even include this in the return statement. This would just be:

return results.summary()

I stumbled upon this when scrolling the documentation for the statsmodels package. Even though this question was asked a while ago, hopefully this helps to finish off this question for you and anyone else who finds their way here.

mariandob
  • 61
  • 2