I have a developer who has created a python script to determine the granger's causality of several datasets that are approximately 3 years worth of daily data (approx 1100 data points for each time series). The script seems to run well but we are not sure what MaxLag we should choose. Our goal is to determine possible causalities AND to determine the lag time in the causality (1 day, 2 days, 7 days, 14 days, etc). Obviously, when we change the maxlag number from 1 to 15 we get very different numbers. See code portion I am referring to below.
granger_test_result = grangercausalitytests(data[:, 1::-1], maxlag=12, verbose=False)
optimal_lag = -1
F_test = -1.0
for key in granger_test_result.keys():
_F_test_ = granger_test_result[key][0]['params_ftest'][0]
if _F_test_ > F_test:
F_test = _F_test_
optimal_lag = key
return optimal_lag
It is my understanding that the higher the MaxLag the more "analyses" are done on the time series which results in the high MaxLag numbers providing stronger causality results. That seemingly would be very helpful but only if we know what the actual "lag" is for the causality.