How to cluster by entity and year, in IV2SLS with linearmodels?

Question

I am working on a panel model of African countries, with their democracy scores, log(gdp per capita) with 3 lags, and log(rain) amounts, also with three lags. I am trying to us IV2SLS to find the economic shocks in log(gdp per capita) (and its lags) caused by the log(rain) (and its lags). I also want to cluster by country and year.

My index is "country", then "year". My column names are the following:

['democracy', 'log_gdp_per_cap', 'log_gdp_per_cap_lag1', 'log_gdp_per_cap_lag2', 'log_gdp_per_cap_lag3', 'lrain_mw_l', 'lrain_mw_l2', 'lrain_mw_l3', 'lrain_mw_l4']

The "lrain_mw_l_" columns are the log(rain) and lags, to be clear.

The model I've currently written is the following:

Dataframe:

iv_log_gdp_per_cap = pd.DataFrame({"democracy": panel_df["polity2"],
                                "log_gdp_per_cap": panel_df["lgdpc"],
                                "log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
                                "log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
                                "log_gdp_per_cap_lag3": panel_df["lgpcp_l3"],
                                "lrain_mw_l": panel_df["lrain_mw_l"],
                                "lrain_mw_l2": panel_df["lrain_mw_l2"],
                                "lrain_mw_l3": panel_df["lrain_mw_l3"],
                                "lrain_mw_l4": panel_df["lrain_mw_l4"]}

 
endog_vars = pd.DataFrame({"log_gdp_per_cap": panel_df["lgdpc"],
                                "log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
                                "log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
                                "log_gdp_per_cap_lag3": panel_df["lgpcp_l3"]})


instrument_vars = pd.DataFrame({"lrain_mw_l": panel_df["lrain_mw_l"],
                                "lrain_mw_l2": panel_df["lrain_mw_l2"],
                                "lrain_mw_l3": panel_df["lrain_mw_l3"]
                                "lrain_mw_l4": panel_df["lrain_mw_l4"] })

The model itself:

iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
                 exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
                 endog = endog_vars,
                 instruments = instrument_vars)


 iv_result = iv_model.fit(cov_type = "clustered", 
                         clusters = [iv_log_gdp_per_cap["country"], 
                                     iv_log_gdp_per_cap["year"]])

It returns the following error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'country'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-64-af8102ff7296> in <module>
      1 iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
----> 2                  exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
      3                  endog = endog_vars,
      4                  instruments = instrument_vars)
      5 # model_3_result = mod3.fit(cov_type = "clustered", cluster_entity = True, cluster_time = True)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'country'

The problem seems to be in the index, but I'm not sure what. So, can anyone give advice on how to use clustered standard errors by entity and year, with IV2SLS?

EDIT:

I have tried the following after resetting the index:

year_country = pd.DataFrame({"country": iv_log_gdp_per_cap["country"],
                                "year": iv_log_gdp_per_cap["year"]})

endog_vars = pd.DataFrame({"log_gdp_per_cap": iv_log_gdp_per_cap["log_gdp_per_cap"],
                                "log_gdp_per_cap_lag1": iv_log_gdp_per_cap["log_gdp_per_cap_lag1"],
                                "log_gdp_per_cap_lag2": iv_log_gdp_per_cap["log_gdp_per_cap_lag2"],
                                "log_gdp_per_cap_lag3": iv_log_gdp_per_cap["log_gdp_per_cap_lag3"]})

instrument_vars = pd.DataFrame({"lrain_mw_l": iv_log_gdp_per_cap["lrain_mw_l"],
                                "lrain_mw_l2": iv_log_gdp_per_cap["lrain_mw_l2"],
                                "lrain_mw_l3": iv_log_gdp_per_cap["lrain_mw_l3"],
                                "lrain_mw_l4": iv_log_gdp_per_cap["lrain_mw_l4"] })


iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
                exog = None,
                endog = endog_vars,
                instruments = instrument_vars)

iv_result = iv_model.fit(cov_type = "clustered")


iv_result

It returns an answer, though not one I have much confidence in. Is it clustering by "year" and "country"? It says it's only clustering one way and I'm not sure what by. If I am trying to use the rain variables as an instrument on the gdp variables, to see the impact of change in log(gdp per capita) on democracy, am I doing this right? I'm also unsure if fixed effects are being used or not.

if `'country'` is part of your index, then it's not a column, and that method is likely looking for a column with the label `'country'`. Use `.reset_index()` to bring it back to the columns. — ALollz, Mar 10 '21 at 20:49
@ALollz I have tried that and updated, can you take another look? — Demosthenes, Mar 10 '21 at 22:38

score 2 · Answer 1 · answered Aug 10 '21 at 18:18

Ok I found the solution

for reference use this https://pypi.org/project/linearmodels/

in your case it would look as follows:

from linearmodels.iv import IV2SLS

iv_result = iv_model.fit(cov_type = "clustered")
data= iv_log_gdp_per_cap
mod = IV2SLS.from_formula('democracy ~ 1 + control1 + control2 + [endog~ 
instrument1 + instrument2]', data)
res = mod.(cov_type = "clustered", 
                     clusters = [iv_log_gdp_per_cap["country"], 
                                 iv_log_gdp_per_cap["year"]])
res.summary

How to cluster by entity and year, in IV2SLS with linearmodels?

1 Answers1