I would like to run PanelOLS
on the data resembling the extract provided below:
import pandas as pd
from io import StringIO
from datetime import datetime
from linearmodels import PanelOLS
import statsmodels.api as sm
TESTDATA = StringIO(
"""EventDate;SomeGroup;DepVal;Measure1;Measure2;Measure3
1/1/2020;A;99;78;378;4.4
2/1/2020;A;94;21;328;4.4
3/1/2020;A;24;76;528;4.4
4/1/2020;A;94;71;318;3.3
1/1/2020;B;39;78;478;2.4
2/1/2020;B;92;23;518;7.2
3/1/2020;B;23;73;128;9.4
4/1/2020;B;93;31;918;3.4
"""
)
convert_date = lambda x: datetime.strptime(x, '%d/%m/%Y')
df = pd.read_csv(
TESTDATA,
sep=";",
parse_dates=["EventDate"],
date_parser=convert_date,
skipinitialspace=True,
index_col=['EventDate', 'SomeGroup']
)
1st approach
I've initially attempted to pass the relevant variables to the PanelOLS
call:
mod_PanelOLS = PanelOLS(df['DepVal'], df[['Measure1', 'Measure2', 'Measure3']])
This fails with the following error message:
ValueError: The index on the time dimension must be either numeric or date-like
2nd Approach
Following the tutorial, I've attempted passing the relevant variables via the add_constant
:
import statsmodels.api as sm
exog = sm.add_constant(df[['EventDate', 'Measure1', 'Measure2', 'Measure3']])
mod_PanelOLS = PanelOLS(df['DepVal'], exog)
which fails with:
KeyError: "['EventDate'] not in index"
Question
What is the proper way of feeding panel data from (with MultiIndex
) into PanelOLS
?