4

I am working on time series analysis and I have sales data (lets call it df_panel as we panel data structure) for 700 individual areas for each month of 2021. e.g.

Area Month Sales
Area 1 January 1000
Area 1 February 2000
Area 1 Marts 3000
Area 2 January 1000
Area 2 February 2000
Area 2 Marts 1400
Area 3 January 1000
Area 3 February 1200
Area 3 Marts 1400

Normally when working on sales data you use e.g. ADF Testing to check for unit roots in the sales data. I know how to do this in Python for a standard non-panel data structure using e.g. the adfuller function from statsmodels on a dataframe df:

adf_test_result = adfuller(df["Sales"])[1]

How can I do something similar for my panel data structure, as it consists of 700 individual sales curves (one for each area). The goal is to use Panel Data Regression (Fixed or Random Effects)

One approximation could be to sum up my panel data sales curve to one sales curve and do the ADF test on that:

adf_test_result = adfuller(df_panel.groupby("Month").sum()["Sales"])

But I think this will greatly overestimate the probability of a unit root in the sales data. A lot of information in the sales data is lost when summing it up like this for 700 individual areas.

Another approximation could maybe be to check for unit roots in each individual area and somehow take the mean (?)

Not exactly sure what is best here...

In R there is package plm with function purtest that implements several testing procedures that have been proposed to test unit root hypotheses with panel data, e.g., "levinlin" for Levin, Lin and Chu (2002), "ips" for Im, Pesaran and Shin (2003), "madwu" for Maddala and Wu (1999), and "hadri" for Hadri (2000).

Does anyone know how to estimate the unit root for panel data structures? And how to implement this in Python?

Helix123
  • 3,502
  • 2
  • 16
  • 36
smallbirds
  • 877
  • 12
  • 35

3 Answers3

5

It seems like there may not be a widely available Python package for doing unit root tests on panel data (at least not that I can find).

You seemed to be familiar with the appropriate methods for doing this sort of test, but for the benefit of other readers I will provide a few links with more information:

If you are able to switch to R or Stata that may be the best solution for your problem. If you want to stick to Python it seems like your options include:

  • Implement the panel data unit tests yourself (a tall task)
  • Call a non-Python library from Python (my advice)

For the second option, here is a quick explanation of calling R from Python: https://medium.com/analytics-vidhya/calling-r-from-python-magic-of-rpy2-d8cbbf991571

And as you mentioned there exists an implementation of this test in R's package plm in function purtest: https://rdrr.io/cran/plm/man/purtest.html

Helix123
  • 3,502
  • 2
  • 16
  • 36
Phoenix
  • 952
  • 2
  • 12
2

The SAS documentation website HERE tells us that the IPS method uses the average of the ADF test statistics across groups/panels. The ADF test is available from the package "statsmodel" library HERE, so you can simply calculate the tau-statistics yourself, take the average, and calculate the p-value using a t-test.

# p-value for a 2-sided t-test
from scipy import stats
2*(stats.t.sf( abs(tau_avg) ,dof=1000 ))

Note that 1000 is just an example for a high degree of freedom.

Gobryas
  • 346
  • 2
  • 8
1

Okay, I think you are right. There are currently no way of doing this directly in Python. This does not mean that it cant be done "using" Python. The solution I found so far, is to use the rpy2 python package where you can call R packages from Python. This is of course not a very elegant solution, but as the packages for doing unit root tests for panel data is currently not there in Python - you just need to have to do with the next best solution.

In general, I found that R have more implementations of statistical tests etc than Python, which is interesting considering that Python is often the "to go" language of data science nowadays. Therefore I started to use rpy2 for many use cases to make sure that the models I am working on are justified statistically - at least until Python is up-to-date in statistics.

smallbirds
  • 877
  • 12
  • 35
  • I thought running R through Python wasn't efficient so I tried to manually do it in Python. see my answer below and let me know what you think. – Gobryas Oct 04 '22 at 20:28