31

I am trying to run grangercausalitytests on two time series:

import numpy as np
import pandas as pd

from statsmodels.tsa.stattools import grangercausalitytests

n = 1000
ls = np.linspace(0, 2*np.pi, n)

df1 = pd.DataFrame(np.sin(ls))
df2 = pd.DataFrame(2*np.sin(1+ls))

df = pd.concat([df1, df2], axis=1)

df.plot()

grangercausalitytests(df, maxlag=20)

However, I am getting

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=272078066917221398041264652288.0000, p=0.0000  , df_denom=996, df_num=1
ssr based chi2 test:   chi2=272897579166972095424217743360.0000, p=0.0000  , df=1
likelihood ratio test: chi2=60811.2671, p=0.0000  , df=1
parameter F test:         F=272078066917220553616334520320.0000, p=0.0000  , df_denom=996, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=7296.6976, p=0.0000  , df_denom=995, df_num=2
ssr based chi2 test:   chi2=14637.3954, p=0.0000  , df=2
likelihood ratio test: chi2=2746.0362, p=0.0000  , df=2
parameter F test:         F=13296850090491009488285469769728.0000, p=0.0000  , df_denom=995, df_num=2
...
/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
     88 
     89 def _raise_linalgerror_singular(err, flag):
---> 90     raise LinAlgError("Singular matrix")
     91 
     92 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

and I am not sure why this is the case.

Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
  • I run into similar issues, I get rid of it with **df.asfreq()** with the same freq as the original df. It makes no sense to me, but it worked, and I do not have to manipulate my data. If you don't know your freq, just use **df.index.freq** on a datetimeindex. – till Kadabra Jul 28 '22 at 10:48

2 Answers2

46

The problem arises due to the perfect correlation between the two series in your data. From the traceback, you can see, that internally a wald test is used to compute the maximum likelihood estimates for the parameters of the lag-time series. To do this an estimate of the parameters covariance matrix (which is then near-zero) and its inverse is needed (as you can also see in the line invcov = np.linalg.inv(cov_p) in the traceback). This near-zero matrix is now singular for some maximum lag number (>=5) and thus the test crashes. If you add just a little noise to your data, the error disappears:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import grangercausalitytests

n = 1000
ls = np.linspace(0, 2*np.pi, n)
df1Clean = pd.DataFrame(np.sin(ls))
df2Clean = pd.DataFrame(2*np.sin(ls+1))
dfClean = pd.concat([df1Clean, df2Clean], axis=1)
dfDirty = dfClean+0.00001*np.random.rand(n, 2)

grangercausalitytests(dfClean, maxlag=20, verbose=False)    # Raises LinAlgError
grangercausalitytests(dfDirty, maxlag=20, verbose=False)    # Runs fine
jotasi
  • 5,077
  • 2
  • 29
  • 51
13

Another thing to keep an eye out for is duplicate columns. Duplicate columns will have a correlation score of 1.0, resulting in singularity. Otherwise, it's also possible you have 2 features that are perfectly correlated. And easy way to check this is with df.corr(), and look for pairs of columns with correlation = 1.0.

user12081571
  • 131
  • 1
  • 3
  • this is not a full answer, you can comment answers if you think something should be added or adda full answer – DanielM Sep 17 '19 at 20:41