3

I am trying to estimate different distributions' parameters via scipy.stats."some_dist".fit(), and I'm having extreme difficulty retrieving any information on the optimization process being used. Specifically, I am looking for the Hessian, which most implemented algorithms output, as documented here.

For all its optimizations, SciPy returns an object called OptimizeResult, which contains useful info such as the Hessian, so how would one get this after .fit()'ing some data, which calls the same optimizations?

Coolio2654
  • 1,589
  • 3
  • 21
  • 46

1 Answers1

3

Looks like a bit of source-diving is needed; fortunately, it does not require copying a lot of source code. This is how basic fit works:

from scipy.stats import cauchy
data = [0, 3, 4, 4, 5, 9]
res = cauchy.fit(data)   # (3.9798237305661255, 0.9205374643383732)

and this is how it is modified to return OptimizeResult:

from scipy.optimize import minimize
args = cauchy._fitstart(data)
x0, func, restore, args = cauchy._reduce_func(args, {})
res = minimize(func, x0, args=(data,), method='BFGS')

Now res is

      fun: 14.337039523098689
 hess_inv: array([[ 0.23321703, -0.0117229 ],
       [-0.0117229 ,  0.36807373]])
      jac: array([ 0.0000000e+00, -1.1920929e-07])
  message: 'Optimization terminated successfully.'
     nfev: 32
      nit: 5
     njev: 8
   status: 0
  success: True
        x: array([3.9798262 , 0.92055376])

where you may recognize the parameters as being res.x. The function being minimized is "penalized NNLF" (nonnegative likelihood function).


By the way,

For all its optimizations, SciPy returns an object called OptimizeResult

is an over-generalization. This is true for minimize method. The scipy.stats.fit uses fmin by default, which returns no such thing.

So, if want identical output to fmin, that can be arranged, but there'll be no extra information there.

from scipy.optimize import fmin
args = cauchy._fitstart(data)
x0, func, restore, args = cauchy._reduce_func(args, {})
res = fmin(func, x0, args=(data,))
print(tuple(res))   #(3.9798237305661255, 0.9205374643383732)

Using minimize with method='Nelder-Mead' has essentially the same effect. You do get a bit of extra information, but given that the method is simplex-based (no derivatives being computed), this information is of limited use.

res = minimize(func, x0, args=(data,), method='Nelder-Mead')
print(res)
print(tuple(res.x))

prints

 final_simplex: (array([[3.97982373, 0.92053746],
       [3.97983057, 0.92060317],
       [3.97977536, 0.92059568]]), array([14.33703952, 14.33703953, 14.33703953]))
           fun: 14.337039523477827
       message: 'Optimization terminated successfully.'
          nfev: 58
           nit: 31
        status: 0
       success: True
             x: array([3.97982373, 0.92053746])
(3.9798237305661255, 0.9205374643383732)
  • I am wondering now, why is `res.x` subtly, but significantly, different from the estimates returned from the vanilla fitting method? Are there certain optimizations in the default method that this isn't picking up? And is so, how would one try to incorporate them? – Coolio2654 Apr 10 '18 at 17:53
  • The default fit is by Nelder-Mead. I used BFGS in my example. –  Apr 10 '18 at 18:47
  • When I used Nelder-Mead with your code and compared it against the default fitted values, they are still slightly off, https://imgur.com/a/7vpQ2 . – Coolio2654 Apr 10 '18 at 21:46
  • It's still a different routine, `minimize(method='Nelder-Mead')` versus `fmin` . Different implementations are likely to produce different results. –  Apr 10 '18 at 21:57
  • Would you be able to just point the direction of uncovering the specific constraints/optimizations per dist. utilized by fmin then? I am confident that with your code, if I knew where to look for the optimal constraints the rest would just be busy work. – Coolio2654 Apr 11 '18 at 03:26
  • fmin knows nothing about distributions, it's a generic Nelder-Mead minimizer. Its code is [here](https://github.com/scipy/scipy/blob/v1.0.0/scipy/optimize/optimize.py#L297-L418). I expanded the answer, and also included the result of `minimize(method='Nelder-Mead')` which for me is identical to what `fit` produced. But I hope you don't think that the output of `fit` is "the correct one" and the others are "wrong". They are all different approximations. It may well be that what I got with BFGS is better than what `.fit` returns. –  Apr 11 '18 at 03:51
  • I was under the impression, from multiple scattered sources online, that scipy's `.fit()` optimization have improved over the years due to distribution-specific constraints: from what you've said maybe most of the improvement actually just came from better overall algorithms. Thank you for all your help so far; it was invaluable! – Coolio2654 Apr 11 '18 at 17:28
  • 1
    The best source is SciPy's own source. [This is how .fit calls a minimizer](https://github.com/scipy/scipy/blob/v1.0.0/scipy/stats/_distn_infrastructure.py#L2211); my code sample above takes exactly this bit of logic out of `fit`. There are no constraints; however there are penalties included in the function. See [`_penalized_nnlf`](https://github.com/scipy/scipy/blob/v1.0.0/scipy/stats/_distn_infrastructure.py#L2010) and the methods it uses. –  Apr 11 '18 at 18:24