1

Consider the simple example below, borrowed from How to use the ccf() method in the statsmodels library?

import pandas as pd
import numpy as np
import statsmodels.tsa.stattools as smt
import matplotlib.pyplot as plt

np.random.seed(123)
test = pd.DataFrame(np.random.randint(0,25,size=(79, 2)), columns=list('AB'))

I know how to create the forward and backward lags of the cross-correlation function (see SO link above) but the issue is how to obtain a proper dataframe containing the correct lag order. I came up with the solution below.

backwards = smt.ccf(test['A'][::-1], test['B'][::-1], adjusted=False)[::-1]

forwards = smt.ccf(test['A'], test['B'], adjusted=False)

#note how we skip the first lag (at 0) because we have a duplicate with the backward values otherwise
a = pd.DataFrame({'lag': range(1, len(forwards)),
              'value' : forwards[1:]})

b = pd.DataFrame({'lag':  [-i for i in list(range(0, len(forwards)))[::-1]],
              'value' : backwards})

full = pd.concat([a,b])
full.sort_values(by = 'lag', inplace = True)
full.set_index('lag').value.plot()

enter image description here

However, this seems to be a lot of code for something that that conceptually is very simple (just appending two lists). Can this code be streamlined?

Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

2 Answers2

1

Well, you can try "just appending to lists":

# also
# cc = list(backards) + list(forwards[1:])
cc = np.concatenate([backwards, forwards[1:]])
full = pd.DataFrame({'lag':np.arange(len(cc))-len(backwards), 
                     'value':cc})
full.plot(x='lag')

Also:

full = (pd.DataFrame({'value':np.concatenate([backwards, forwards[1:]])})
          .assign(lag=lambda x: x.index - len(backwards) )
       )

Output:

enter image description here

Note if all you want is to plot the two arrays, then this would do

plt.plot(-np.arange(len(backwards)), backwards, c='C0')
plt.plot(forwards, c='C0')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

For Quang Hoang's answer, I suggest to use np.arange(len(cc))-len(backwards)-1 because ccf returns the cross correlation coefficient starting from lag 0.