3

I have two-time series datasets i.e. errors received and bookings received on a daily basis for three years (a few million rows). I wish to find if there is any relationship between them.As of now, I think that cross-correlation between these two series might help. I order to so, should I perform any transformations like stationarity, detrending, deseasonality, etc. If this is correct, I'm thinking of using "scipy.signal.correlate¶" but really want to know how to interpret the result?

harshit
  • 333
  • 1
  • 2
  • 13
  • From the description, it looks like you should use NumPy's `corrcoef` https://numpy.org/devdocs/reference/generated/numpy.corrcoef.html – Pierre de Buyl Jun 12 '20 at 11:36
  • I was looking for time series variables . np.corrcef is just plain old Pearson Coefficient that can't be used in this case. – harshit Jun 12 '20 at 16:14
  • `scipy.signal.correlate` is for the correlation of time series indeed. For series `y1` and `y2`, `correlate(y1, y2)` returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag. – Pierre de Buyl Jun 12 '20 at 19:04
  • Can you please explain any difference between them ? Also, should I perform any detrending, deseasonality etc before passing them into the function? – harshit Jun 12 '20 at 21:57
  • `scipy.signal.correlate` takes two times series and returns the time-dependent correlation between them. `numpy.corrcoef` takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine) and does so for N rows, returning a NxN array of correlations. The diagonal is supposed to be 1 (self correlation). – Pierre de Buyl Jun 13 '20 at 06:02
  • The question about detrending etc depend on your specific problem and is too vague. – Pierre de Buyl Jun 13 '20 at 06:03
  • Great. I got your point . – harshit Jun 13 '20 at 20:59
  • If this replies to your question, I can make it an answer. – Pierre de Buyl Jun 14 '20 at 16:14
  • Sure. Please do. – harshit Jun 15 '20 at 06:01

1 Answers1

4

scipy.signal.correlate is for the correlation of time series. For series y1 and y2, correlate(y1, y2) returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag: close to one if y1 and y2 have similar trends (for normalized data), close to zero if the series are independent.

numpy.corrcoef takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine), the Pearson correlation, and does so for N rows, returning a NxN array of correlations. corrcoef normalizes the data (divides the results by their rms value), so that he diagonal is supposed to be 1 (average self correlation).

The questions about stationarity, detrending, and deseasonality depend on your specific problem. The routines above consider "plain" data without consideration for their signification.

Pierre de Buyl
  • 7,074
  • 2
  • 16
  • 22