R: Using a variable with less observations in a regression (plm)

Question

I have been trying to deal with this for a while now with no luck. Essentially, what I am doing is a two-stage least squares on some panel data. To do this I am using the plm package. What I want to do is

Do a 2SLS
Get the residuals from the 2SLS in 1.
Use these residuals as an instrument in a different 2SLS

The issue I have is that in the first 2SLS the number of observations used is less than the total observations in the dataset, so my residuals vector is short and I get the following error

Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, : variable lengths differ (found for 'ivreg.2.a$residuals')

Here is the code I am trying to run for reference, let me know if you need any more details. I really just need my residual vector to be the same length as the data used in the first 2SLS. For reference my data has 1713 observations, however, only 1550 get used in the regression and as a result my residuals vector is length 1550. My code for the two 2SLS regressions is below.

ivreg.2.a = plm(formula = diff(loda) ~ factor(year)+diff(lgdp) | index_g_l + diff(lcru_l) + diff(lcru_l_sq) + factor(year), index = c("country", "year"), model = "within", data = panel[complete.cases(panel[, c(1,2,3,4,5,7)]),])

 ivreg.2.a = plm(formula = diff(lgdp) ~ factor(year)+index_g_l + diff(lcru_l) + diff(lcru_l_sq) + diff(loda)| index_g_l + diff(lcru_l) + diff(lcru_l_sq) + factor(year) + ivreg.2.a$residuals, index = c("country", "year"), model = "within", data = panel[complete.cases(panel[, c(1,2,3,4,5,7)]),])

Let me know if you need anything else.

Just use `plm`'s built-in 2SLS capabilities, no need to execute the various steps manually. See the package's vignette for how to do it. — Helix123, Jul 23 '22 at 10:01

score 2 · Answer 1 · answered Dec 08 '17 at 11:42

I assume the 163 observations are dropped because they have NA in one of the relevant variables. Most *lm functions in R have a na.action argument, which can be used to pad the residuals to correct length. E.g., when missing observation 3,

residuals(lm(formula, data, na.action=na.omit)) # 1 2 4
residuals(lm(formula, data, na.action=na.exclude)) # 1 2 NA 4

Documentation of plm, however, says that this argument is "currently not fully supported", so it would be simpler if you just filter those 1550 rows to a new dataframe first, and do all subsequent work on that.

BTW, if plm behaves like lm, you shouldn't need to specify complete.cases for it to work, as it should just skip anything with NAs.

Hi it is the NAs that are why they are dropped. I will try these suggestions thank you! The issue is I previously tried removing NA from the data to the clean 1550 regressions but this changes the regression results as it still drops some observations. — student_t, Dec 08 '17 at 12:46

R: Using a variable with less observations in a regression (plm)

1 Answers1