I would like to compute an autocorrelation estimate in python. If the array has no NAN values, the autocorrelation can be computed explicitly via
def autocorr_naive(x):
N = len(x)
return np.array([np.mean(x[iSh:] * x[:N-iSh]) for iSh in range(N)])
Or using the numpy function correlate
def autocorr_numpy(x):
N = len(x)
return np.correlate(x, x, 'full')[N-1:] / N
The numpy function is significantly faster than the hand-written one, presumably because it uses Wiener-Khinchine theorem or similar to efficiently approximate the correlation.
The problem is that numpy.correlate does not currently seem to handle correlations if NAN values are present in the overlap. The naive extension to handle NAN values is simply to ignore them when calculating the mean
def autocorr_naive_nan(x):
N = len(x)
return np.array([np.nanmean(x[iSh:] * x[:N-iSh]) for iSh in range(N)])
The naive extension has two problems. Firstly, it is painfully slow compared to numpy implementation. Secondly, it has a lot of undesired wiggles at the tail, where the overlap consists of only a few points, and the estimate is naturally poor. The FFT based approximation used in numpy does not appear to be biased by these artifacts, at least not to the same extent.
Pragmatic Question: Is there a library I can use to compute the equivalent of autocorr_naive_nan
in an efficient way?