1

I have a generated a list of dataframes called new_new_dfs that all have this general format, with some variation in the number of Coupons and the number of rows:

dataframe

They are columns of differenced Single Month Mortality (SMM) for bond securities (groupings of mortgage loans) of different Coupons (i.e. interest rates) month-to-month. I next have this code:

for df in new_new_dfs:
           
        train = df[df.index <= max(df.index) - relativedelta(months = 3)]
        test = df[df.index > max(df.index) - relativedelta(months = 3)]
        train = train.dropna()
        
        if train.empty is False and len(train) > 10 and len(list(train.columns)) > 1:
                model = VAR(train)
                result = model.fit()
                result.summary()

To try to create a vector autoregression model for each of the dataframes in the list. I also skip empty dataframes and check for # of rows and columns to ensure that each dataframe is suitable for a VAR. However, about 11 dataframes in I get this error traceback:

LinAlgError                               Traceback (most recent call last)
Input In [135], in <cell line: 4>()
     13 i+=1
     14 print(i)
---> 15 result.summary()

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py:1835, in VARResults.summary(self)
   1828 def summary(self):
   1829     """Compute console output summary of estimates
   1830 
   1831     Returns
   1832     -------
   1833     summary : VARSummary
   1834     """
-> 1835     return VARSummary(self)

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\output.py:71, in VARSummary.__init__(self, estimator)
     69 def __init__(self, estimator):
     70     self.model = estimator
---> 71     self.summary = self.make()

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\output.py:83, in VARSummary.make(self, endog_names, exog_names)
     80 buf = StringIO()
     82 buf.write(self._header_table() + '\n')
---> 83 buf.write(self._stats_table() + '\n')
     84 buf.write(self._coef_table() + '\n')
     85 buf.write(self._resid_info() + '\n')

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\output.py:130, in VARSummary._stats_table(self)
    122 part2Lstubs = ('No. of Equations:',
    123                'Nobs:',
    124                'Log likelihood:',
    125                'AIC:')
    126 part2Rstubs = ('BIC:',
    127                'HQIC:',
    128                'FPE:',
    129                'Det(Omega_mle):')
--> 130 part2Ldata = [[model.neqs], [model.nobs], [model.llf], [model.aic]]
    131 part2Rdata = [[model.bic], [model.hqic], [model.fpe], [model.detomega]]
    132 part2Lheader = None

File ~\Anaconda3\lib\site-packages\pandas\_libs\properties.pyx:37, in pandas._libs.properties.CachedProperty.__get__()

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py:1540, in VARResults.llf(self)
   1537 @cache_readonly
   1538 def llf(self):
   1539     "Compute VAR(p) loglikelihood"
-> 1540     return var_loglike(self.resid, self.sigma_u_mle, self.nobs)

File ~\Anaconda3\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py:334, in var_loglike(resid, omega, nobs)
    306 def var_loglike(resid, omega, nobs):
    307     r"""
    308     Returns the value of the VAR(p) log-likelihood.
    309 
   (...)
    332         \left(\ln\left|\Omega\right|-K\ln\left(2\pi\right)-K\right)
    333     """
--> 334     logdet = logdet_symm(np.asarray(omega))
    335     neqs = len(omega)
    336     part1 = -(nobs * neqs / 2) * np.log(2 * np.pi)

File ~\Anaconda3\lib\site-packages\statsmodels\tools\linalg.py:28, in logdet_symm(m, check_symm)
     26     if not np.all(m == m.T):  # would be nice to short-circuit check
     27         raise ValueError("m is not symmetric.")
---> 28 c, _ = linalg.cho_factor(m, lower=True)
     29 return 2*np.sum(np.log(c.diagonal()))

File ~\Anaconda3\lib\site-packages\scipy\linalg\decomp_cholesky.py:152, in cho_factor(a, lower, overwrite_a, check_finite)
     93 def cho_factor(a, lower=False, overwrite_a=False, check_finite=True):
     94     """
     95     Compute the Cholesky decomposition of a matrix, to use in cho_solve
     96 
   (...)
    150 
    151     """
--> 152     c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=False,
    153                          check_finite=check_finite)
    154     return c, lower

File ~\Anaconda3\lib\site-packages\scipy\linalg\decomp_cholesky.py:37, in _cholesky(a, lower, overwrite_a, clean, check_finite)
     35 c, info = potrf(a1, lower=lower, overwrite_a=overwrite_a, clean=clean)
     36 if info > 0:
---> 37     raise LinAlgError("%d-th leading minor of the array is not positive "
     38                       "definite" % info)
     39 if info < 0:
     40     raise ValueError('LAPACK reported an illegal value in {}-th argument'
     41                      'on entry to "POTRF".'.format(-info))

LinAlgError: 6-th leading minor of the array is not positive definite

And I'm not sure what it's referring to. I have tried to print each train dataframe to inspect the dataframe it doesn't like, but I can't tell what about it is problematic for the VAR model. Let me know if you have any ideas as to what the problem is here. Thank you!

hulio_entredas
  • 675
  • 1
  • 12
  • 1
    This is a great question and something I am also struggling to get around with my VAR model attempts. – Hefe Sep 22 '22 at 20:48
  • I am struggling with the exact same issue! Did you find a solution yet? – LGR Oct 05 '22 at 20:03
  • No but reading other questions it seems like it may have something to do with the compositions of the arrays we are passing in to the VAR model. It is something equivalent to dividing by zero, but in the linear algebra space. [Related question](https://stackoverflow.com/a/21605863/15975987) – Hefe Nov 28 '22 at 21:55

1 Answers1

0

Using a stationary series solved my problem.

To check if your series is stationary, perform the Augmented Dickey-Fuller Test using the following code.

for name, column in df.iteritems():
    adfuller_test(column, name=column.name)
    print('\n')

If your series is not stationary, use the following code to differentiate it and perform the test again.

df_differenced = df.diff().dropna()

And then perform the test again

# ADF Test on each column of 1st Differences Dataframe
for name, column in df_differenced.iteritems():
    adfuller_test(column, name=column.name)
    print('\n')

Repeat the differencing step until your series becomes stationary, and then you can use VAR on this differenced dataframe.