I am working on time series analysis and I have sales data (lets call it df_panel as we panel data structure) for 700 individual areas for each month of 2021. e.g.
Area | Month | Sales |
---|---|---|
Area 1 | January | 1000 |
Area 1 | February | 2000 |
Area 1 | Marts | 3000 |
Area 2 | January | 1000 |
Area 2 | February | 2000 |
Area 2 | Marts | 1400 |
Area 3 | January | 1000 |
Area 3 | February | 1200 |
Area 3 | Marts | 1400 |
Normally when working on sales data you use e.g. ADF Testing to check for unit roots in the sales data. I know how to do this in Python for a standard non-panel data structure using e.g. the adfuller function from statsmodels on a dataframe df:
adf_test_result = adfuller(df["Sales"])[1]
How can I do something similar for my panel data structure, as it consists of 700 individual sales curves (one for each area). The goal is to use Panel Data Regression (Fixed or Random Effects)
One approximation could be to sum up my panel data sales curve to one sales curve and do the ADF test on that:
adf_test_result = adfuller(df_panel.groupby("Month").sum()["Sales"])
But I think this will greatly overestimate the probability of a unit root in the sales data. A lot of information in the sales data is lost when summing it up like this for 700 individual areas.
Another approximation could maybe be to check for unit roots in each individual area and somehow take the mean (?)
Not exactly sure what is best here...
In R there is package plm
with function purtest
that implements several testing procedures that have been proposed to test unit root hypotheses with panel data, e.g., "levinlin" for Levin, Lin and Chu (2002), "ips" for Im, Pesaran and Shin (2003), "madwu" for Maddala and Wu (1999), and "hadri" for Hadri (2000).
Does anyone know how to estimate the unit root for panel data structures? And how to implement this in Python?