1

I'm trying to run a GEE using an autoregressive structure for some panel data in statsmodels, looking at differences between sales during different hours of a shift:

ga = sm.families.Gaussian()
ar = sm.cov_struct.Autoregressive()
times = (BakeSale['Hour'].values)
ar.dep_params = 0.06
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["SalesPerson"],
                  data=BakeSale, family=ga, time=times, cov_struct=ar)
result2 = model2.fit(start_params=result1.params)
print(result2.summary())

This raises a ValueError: Not a bracketing interval.

I currently have the 'Hour' of the shift coded as an ordinal integer (i.e. 1-8), but also have timestamps as well.

Any thoughts for how to overcome this?

Full output:

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:724: RuntimeWarning: divide by zero encountered in true_divide
  wts = 1. / var
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:725: RuntimeWarning: invalid value encountered in true_divide
  wts /= wts.sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-d81d0b97546e> in <module>()
      7 #CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople"
      8 # Maybe try without C, or find if any with nan value or such
----> 8 result2 = model2.fit(start_params=result1.params)
      9 print(result2.summary())
     10 print(ar.summary())

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in fit(self, maxiter, ctol, start_params, params_niter, first_dep_update, cov_type, ddof_scale, scaling_factor)
   1111             if (self.update_dep and (itr % params_niter) == 0
   1112                 and (itr >= first_dep_update)):
-> 1113                 self._update_assoc(mean_params)
   1114                 num_assoc_updates += 1
   1115 

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in _update_assoc(self, params)
   1259         """
   1260 
-> 1261         self.cov_struct.update(params)
   1262 
   1263     def _derivative_exog(self, params, exog=None, transform='dydx',

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py in update(self, params)
    766 
    767         from scipy.optimize import brent
--> 768         self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
    769 
    770 

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in brent(func, args, brack, tol, full_output, maxiter)
   2001     options = {'xtol': tol,
   2002                'maxiter': maxiter}
-> 2003     res = _minimize_scalar_brent(func, brack, args, **options)
   2004     if full_output:
   2005         return res['x'], res['fun'], res['nit'], res['nfev']

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in _minimize_scalar_brent(func, brack, args, xtol, maxiter, **unknown_options)
   2033                   full_output=True, maxiter=maxiter)
   2034     brent.set_bracket(brack)
-> 2035     brent.optimize()
   2036     x, fval, nit, nfev = brent.get_result(full_output=True)
   2037     return OptimizeResult(fun=fval, x=x, nit=nit, nfev=nfev,

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in optimize(self)
   1839         # set up for optimization
   1840         func = self.func
-> 1841         xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
   1842         _mintol = self._mintol
   1843         _cg = self._cg

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in get_bracket_info(self)
   1827             fc = func(*((xc,) + args))
   1828             if not ((fb < fa) and (fb < fc)):
-> 1829                 raise ValueError("Not a bracketing interval.")
   1830             funcalls = 3
   1831         else:

ValueError: Not a bracketing interval.
codercat
  • 13
  • 6
  • Did you try `groups=BakeSaleData["Salesperson"]` from you previous question? You need to show the full traceback so we can see where it fails. I guess finding the autocorrelation parameter is not robust to extreme cases. – Josef Mar 24 '17 at 23:10
  • Based on my reading of the function to estimate the correlation coefficient, the ValueError should only be raised if the correlation is negative, which doesn't seem to be allowed in the implementation. But negative correlation would be a strange case for this type of application. – Josef Mar 24 '17 at 23:46
  • Should have the full output now. Thanks! – codercat Mar 25 '17 at 02:45

1 Answers1

0

Often in life one needs to make sure that one is starting from the right data to begin with. For instance, examining individual Shifts rather than Salespeople:

model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["Shift"],
              data=BakeSale, family=ga, time=times, cov_struct=ex)

Demonstrated that the max cluster size was suspiciously off, and the the mean cluster size was just above 8.

Review of the wrangling of the original dataset revealed that several shifts had been mistakenly coded with many, many more than the appropriate number of hours for a shift. Once this was corrected, the model was able to run appropriately....

codercat
  • 13
  • 6