I have two-time series datasets i.e. errors received and bookings received on a daily basis for three years (a few million rows). I wish to find if there is any relationship between them.As of now, I think that cross-correlation between these two series might help. I order to so, should I perform any transformations like stationarity, detrending, deseasonality, etc. If this is correct, I'm thinking of using "scipy.signal.correlate¶" but really want to know how to interpret the result?
-
From the description, it looks like you should use NumPy's `corrcoef` https://numpy.org/devdocs/reference/generated/numpy.corrcoef.html – Pierre de Buyl Jun 12 '20 at 11:36
-
I was looking for time series variables . np.corrcef is just plain old Pearson Coefficient that can't be used in this case. – harshit Jun 12 '20 at 16:14
-
`scipy.signal.correlate` is for the correlation of time series indeed. For series `y1` and `y2`, `correlate(y1, y2)` returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag. – Pierre de Buyl Jun 12 '20 at 19:04
-
Can you please explain any difference between them ? Also, should I perform any detrending, deseasonality etc before passing them into the function? – harshit Jun 12 '20 at 21:57
-
`scipy.signal.correlate` takes two times series and returns the time-dependent correlation between them. `numpy.corrcoef` takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine) and does so for N rows, returning a NxN array of correlations. The diagonal is supposed to be 1 (self correlation). – Pierre de Buyl Jun 13 '20 at 06:02
-
The question about detrending etc depend on your specific problem and is too vague. – Pierre de Buyl Jun 13 '20 at 06:03
-
Great. I got your point . – harshit Jun 13 '20 at 20:59
-
If this replies to your question, I can make it an answer. – Pierre de Buyl Jun 14 '20 at 16:14
-
Sure. Please do. – harshit Jun 15 '20 at 06:01
1 Answers
scipy.signal.correlate
is for the correlation of time series. For series y1
and y2
, correlate(y1, y2)
returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag: close to one if y1 and y2 have similar trends (for normalized data), close to zero if the series are independent.
numpy.corrcoef
takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine), the Pearson correlation, and does so for N rows, returning a NxN array of correlations. corrcoef
normalizes the data (divides the results by their rms value), so that he diagonal is supposed to be 1 (average self correlation).
The questions about stationarity, detrending, and deseasonality depend on your specific problem. The routines above consider "plain" data without consideration for their signification.

- 7,074
- 2
- 16
- 22