Creating a bootstrap sample by group in python

Question

I have a dataframe looking something like that:

         y   X1  X2  X3
ID year
1  2010  1   2   3   4
1  2011  3   4   5   6
2  2010  1   2   3   4
2  2011  3   4   5   6
2  2012  7   8   9  10
...

I'd like to create several bootstrap sample from the original df, calculate a fixed effects panel regression on the new bootstrap samples and than store the corresponding beta coefficients. The approach I found for "normal" linear regression is the following

betas = pd.DataFrame()
for i in range(10):
    # Creating a bootstrap sample with replacement
    bootstrap = df.sample(n=df.shape[0], replace=True)
    # Fit the regression and save beta coefficients
    DV_bs = bootstrap.y
    IV_bs = sm2.add_constant(bootstrap[['X1', 'X2', 'X3']])
    fe_mod_bs = PanelOLS(DV_bs, IV_bs, entity_effects=True ).fit(cov_type='clustered', cluster_entity=True)
    b = pd.DataFrame(fe_mod_bs.params)
    print(b.head())
    betas = pd.concat([betas, b], axis = 1, join = 'outer')

Unfortunately the bootstrap samples need to be selected by group for the panel regression, so that a complete ID is picked instead of just one row. I could not figure out how to extend the function to create a sample that way. So I basically have two questions:

Does the overall approach make sense for panel regression at all?
How do I adjust the bootstrapping so that the multilevel / panel structure is taken into account and complete IDs instead of single rows are "picked" during the bootstrapping?

score 1 · Accepted Answer · answered Jul 01 '20 at 13:48

I solved my problem with the following code:

companies = pd.DataFrame(df.reset_index().Company.unique())

betas_summary = pd.DataFrame()
for i in tqdm(range(1, 10001)):
    # Creating a bootstrap sample with replacement
    bootstrap = companies.sample(n=companies.shape[0], replace=True)
    bootstrap.rename(columns={bootstrap.columns[0]: "Company"}, inplace=True)
    Period = list(range(1, 25))
    list_of_bs_comp = bootstrap.Company.to_list()
    multiindex = [list_of_bs_comp, np.array(Period)]
    bs_df = pd.MultiIndex.from_product(multiindex, names=['Company', 'Period'])
    bs_result = df.loc[bs_df, :]
    
    betas = pd.DataFrame()
    
    # Fit the regression and save beta coefficients
    DV_bs = bs_result.y
    IV_bs = sm2.add_constant(bs_result[['X1', 'X2', 'X3']])
    fe_mod_bs = PanelOLS(DV_bs, IV_bs, entity_effects=True ).fit(cov_type='clustered', cluster_entity=True)
    b = pd.DataFrame(fe_mod_bs.params)
    b.rename(columns={'parameter':"b"}, inplace=True)
    betas = pd.concat([betas, b], axis = 1, join = 'outer')

where Company is my entity variable and Period is my time variable

Creating a bootstrap sample by group in python

1 Answers1