I have a dataframe looking something like that:
y X1 X2 X3
ID year
1 2010 1 2 3 4
1 2011 3 4 5 6
2 2010 1 2 3 4
2 2011 3 4 5 6
2 2012 7 8 9 10
...
I'd like to create several bootstrap sample from the original df, calculate a fixed effects panel regression on the new bootstrap samples and than store the corresponding beta coefficients. The approach I found for "normal" linear regression is the following
betas = pd.DataFrame()
for i in range(10):
# Creating a bootstrap sample with replacement
bootstrap = df.sample(n=df.shape[0], replace=True)
# Fit the regression and save beta coefficients
DV_bs = bootstrap.y
IV_bs = sm2.add_constant(bootstrap[['X1', 'X2', 'X3']])
fe_mod_bs = PanelOLS(DV_bs, IV_bs, entity_effects=True ).fit(cov_type='clustered', cluster_entity=True)
b = pd.DataFrame(fe_mod_bs.params)
print(b.head())
betas = pd.concat([betas, b], axis = 1, join = 'outer')
Unfortunately the bootstrap samples need to be selected by group for the panel regression, so that a complete ID is picked instead of just one row. I could not figure out how to extend the function to create a sample that way. So I basically have two questions:
- Does the overall approach make sense for panel regression at all?
- How do I adjust the bootstrapping so that the multilevel / panel structure is taken into account and complete IDs instead of single rows are "picked" during the bootstrapping?