I am working on a panel model of African countries, with their democracy scores, log(gdp per capita) with 3 lags, and log(rain) amounts, also with three lags. I am trying to us IV2SLS to find the economic shocks in log(gdp per capita) (and its lags) caused by the log(rain) (and its lags). I also want to cluster by country and year.
My index is "country", then "year". My column names are the following:
['democracy', 'log_gdp_per_cap', 'log_gdp_per_cap_lag1', 'log_gdp_per_cap_lag2', 'log_gdp_per_cap_lag3', 'lrain_mw_l', 'lrain_mw_l2', 'lrain_mw_l3', 'lrain_mw_l4']
The "lrain_mw_l_"
columns are the log(rain) and lags, to be clear.
The model I've currently written is the following:
Dataframe:
iv_log_gdp_per_cap = pd.DataFrame({"democracy": panel_df["polity2"],
"log_gdp_per_cap": panel_df["lgdpc"],
"log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
"log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
"log_gdp_per_cap_lag3": panel_df["lgpcp_l3"],
"lrain_mw_l": panel_df["lrain_mw_l"],
"lrain_mw_l2": panel_df["lrain_mw_l2"],
"lrain_mw_l3": panel_df["lrain_mw_l3"],
"lrain_mw_l4": panel_df["lrain_mw_l4"]}
endog_vars = pd.DataFrame({"log_gdp_per_cap": panel_df["lgdpc"],
"log_gdp_per_cap_lag1": panel_df["lgpcp_l"],
"log_gdp_per_cap_lag2": panel_df["lgpcp_l2"],
"log_gdp_per_cap_lag3": panel_df["lgpcp_l3"]})
instrument_vars = pd.DataFrame({"lrain_mw_l": panel_df["lrain_mw_l"],
"lrain_mw_l2": panel_df["lrain_mw_l2"],
"lrain_mw_l3": panel_df["lrain_mw_l3"]
"lrain_mw_l4": panel_df["lrain_mw_l4"] })
The model itself:
iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
endog = endog_vars,
instruments = instrument_vars)
iv_result = iv_model.fit(cov_type = "clustered",
clusters = [iv_log_gdp_per_cap["country"],
iv_log_gdp_per_cap["year"]])
It returns the following error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'country'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-64-af8102ff7296> in <module>
1 iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
----> 2 exog = [iv_log_gdp_per_cap["country"], iv_log_gdp_per_cap["year"]],
3 endog = endog_vars,
4 instruments = instrument_vars)
5 # model_3_result = mod3.fit(cov_type = "clustered", cluster_entity = True, cluster_time = True)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'country'
The problem seems to be in the index, but I'm not sure what. So, can anyone give advice on how to use clustered standard errors by entity and year, with IV2SLS?
EDIT:
I have tried the following after resetting the index:
year_country = pd.DataFrame({"country": iv_log_gdp_per_cap["country"],
"year": iv_log_gdp_per_cap["year"]})
endog_vars = pd.DataFrame({"log_gdp_per_cap": iv_log_gdp_per_cap["log_gdp_per_cap"],
"log_gdp_per_cap_lag1": iv_log_gdp_per_cap["log_gdp_per_cap_lag1"],
"log_gdp_per_cap_lag2": iv_log_gdp_per_cap["log_gdp_per_cap_lag2"],
"log_gdp_per_cap_lag3": iv_log_gdp_per_cap["log_gdp_per_cap_lag3"]})
instrument_vars = pd.DataFrame({"lrain_mw_l": iv_log_gdp_per_cap["lrain_mw_l"],
"lrain_mw_l2": iv_log_gdp_per_cap["lrain_mw_l2"],
"lrain_mw_l3": iv_log_gdp_per_cap["lrain_mw_l3"],
"lrain_mw_l4": iv_log_gdp_per_cap["lrain_mw_l4"] })
iv_model = IV2SLS(dependent = iv_log_gdp_per_cap["democracy"],
exog = None,
endog = endog_vars,
instruments = instrument_vars)
iv_result = iv_model.fit(cov_type = "clustered")
iv_result
It returns an answer, though not one I have much confidence in. Is it clustering by "year" and "country"? It says it's only clustering one way and I'm not sure what by. If I am trying to use the rain variables as an instrument on the gdp variables, to see the impact of change in log(gdp per capita) on democracy, am I doing this right? I'm also unsure if fixed effects are being used or not.