0

I would like to run PanelOLS on the data resembling the extract provided below:

import pandas as pd
from io import StringIO
from datetime import datetime
from linearmodels import PanelOLS
import statsmodels.api as sm

TESTDATA = StringIO(
    """EventDate;SomeGroup;DepVal;Measure1;Measure2;Measure3
    1/1/2020;A;99;78;378;4.4
    2/1/2020;A;94;21;328;4.4
    3/1/2020;A;24;76;528;4.4
    4/1/2020;A;94;71;318;3.3
    1/1/2020;B;39;78;478;2.4
    2/1/2020;B;92;23;518;7.2
    3/1/2020;B;23;73;128;9.4
    4/1/2020;B;93;31;918;3.4
    """
)

convert_date = lambda x: datetime.strptime(x, '%d/%m/%Y')
df = pd.read_csv(
    TESTDATA,
    sep=";",
    parse_dates=["EventDate"],
    date_parser=convert_date,
    skipinitialspace=True,
    index_col=['EventDate', 'SomeGroup']
)

1st approach

I've initially attempted to pass the relevant variables to the PanelOLS call:

mod_PanelOLS = PanelOLS(df['DepVal'], df[['Measure1', 'Measure2', 'Measure3']])

This fails with the following error message:

ValueError: The index on the time dimension must be either numeric or date-like

2nd Approach

Following the tutorial, I've attempted passing the relevant variables via the add_constant:

import statsmodels.api as sm
exog = sm.add_constant(df[['EventDate', 'Measure1', 'Measure2', 'Measure3']])
mod_PanelOLS = PanelOLS(df['DepVal'], exog)

which fails with:

KeyError: "['EventDate'] not in index"

Question

What is the proper way of feeding panel data from (with MultiIndex) into PanelOLS?

Konrad
  • 17,740
  • 16
  • 106
  • 167

1 Answers1

0

It seems that the issue is with a creation of MultiIndex passing columns names so the dated column is second solves the issue:

df = pd.read_csv(
    TESTDATA,
    sep=";",
    parse_dates=["EventDate"],
    date_parser=convert_date,
    skipinitialspace=True,
    index_col=['SomeGroup', 'EventDate']
)
mod_PanelOLS = PanelOLS(df['DepVal'], df[['Measure1', 'Measure2', 'Measure3']])
Konrad
  • 17,740
  • 16
  • 106
  • 167