How do I perform crosscorelation between two time series and what transformations should I perform in python?

Question

I have two-time series datasets i.e. errors received and bookings received on a daily basis for three years (a few million rows). I wish to find if there is any relationship between them.As of now, I think that cross-correlation between these two series might help. I order to so, should I perform any transformations like stationarity, detrending, deseasonality, etc. If this is correct, I'm thinking of using "scipy.signal.correlate¶" but really want to know how to interpret the result?

From the description, it looks like you should use NumPy's `corrcoef` https://numpy.org/devdocs/reference/generated/numpy.corrcoef.html — Pierre de Buyl, Jun 12 '20 at 11:36
I was looking for time series variables . np.corrcef is just plain old Pearson Coefficient that can't be used in this case. — harshit, Jun 12 '20 at 16:14
`scipy.signal.correlate` is for the correlation of time series indeed. For series `y1` and `y2`, `correlate(y1, y2)` returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag. — Pierre de Buyl, Jun 12 '20 at 19:04
Can you please explain any difference between them ? Also, should I perform any detrending, deseasonality etc before passing them into the function? — harshit, Jun 12 '20 at 21:57
`scipy.signal.correlate` takes two times series and returns the time-dependent correlation between them. `numpy.corrcoef` takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine) and does so for N rows, returning a NxN array of correlations. The diagonal is supposed to be 1 (self correlation). — Pierre de Buyl, Jun 13 '20 at 06:02
The question about detrending etc depend on your specific problem and is too vague. — Pierre de Buyl, Jun 13 '20 at 06:03

score 4 · Accepted Answer · answered Jun 15 '20 at 10:05

scipy.signal.correlate is for the correlation of time series. For series y1 and y2, correlate(y1, y2) returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag: close to one if y1 and y2 have similar trends (for normalized data), close to zero if the series are independent.

numpy.corrcoef takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine), the Pearson correlation, and does so for N rows, returning a NxN array of correlations. corrcoef normalizes the data (divides the results by their rms value), so that he diagonal is supposed to be 1 (average self correlation).

The questions about stationarity, detrending, and deseasonality depend on your specific problem. The routines above consider "plain" data without consideration for their signification.

How do I perform crosscorelation between two time series and what transformations should I perform in python?

1 Answers1