1

I am trying to loop over a set of dependent variables using the feols function from the fixest package. In base using lm or in lfe::felm I would simply use the get() function. With fixest, I receive an error. Why is that, and is there a way to get around it? Here is a reproducable example:

library(data.table)
library(lfe)
library(fixest)

N <- 1000
dt <- data.table(
  x1 = rnorm(N),
  x2 = rnorm(N),
  x3 = rnorm(N)
)

beta <- rnorm(3)
dt[, y1 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]
dt[, y2 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]
dt[, y3 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]

dt
beta
depvars <- c("y1", "y2", "y3")

res_lm <- 
lapply(depvars, function(i){
  res <- lm(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

res_felm <-
lapply(depvars, function(i){
  res <- felm(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

res_feols <- 
lapply(depvars, function(i){
  res <- feols(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

# Error in feols(get(i) ~ x1 + x2 + x3, data = dt) : 
# The variable i is in the LHS of the formula but not in the dataset. 
A.Fischer
  • 596
  • 5
  • 11

2 Answers2

3

Update

From fixest version 0.8.0 onwards, you can perform multiple estimations directly:

res <- feols(c(y1, y2, y3) ~ x1 + x2 + x3, data = dt)
etable(res)

The previous code performs 3 estimations, one for each dependent variable. Note that you can also have multiple RHS, multiple fixed-effects or multiple samples (see the vignette).


Update

From fixest version 0.7 onwards the formula macro parser accepts character vectors. So the following works:

depvars <- c("y1", "y2", "y3")

lapply(depvars, function(var) {
    res <- feols(xpd(..lhs ~ x1 + x2 + x3, ..lhs = var), data = dt)
    summary(res)
})

The answer by Allan is perfectly right. Alternatively you can use some tools within fixest toolbox, namely formula macros.

You can use the function xpd to "expand" a formula. Here's an example:

depvars <- list(~ y1, ~ y2, ~ y3)

lapply(depvars, function(var) {
    res <- feols(xpd(..lhs ~ x1 + x2 + x3, ..lhs = var), data = dt)
    summary(res)
})

Note that the dependent variable here must be represented by one sided formulas instead of character strings. The variable ..lhs starts with two dots and is the "macro" variable that will be replaced by var. Note also that macro variables must start with two dots (to distinguish them from regular variables).

This leads to the following results (same as Allan):

#> [[1]]
#> OLS estimation, Dep. Var.: y1
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.026417   0.033482 -0.788981  0.430311    
#> x1           0.675098   0.033764 19.995000 < 2.2e-16 ***
#> x2           0.022227   0.032332  0.687468  0.491948    
#> x3          -0.001915   0.034032 -0.056276  0.955133    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,470.78   Adj. R2: 0.28434 
#> 
#> [[2]]
#> OLS estimation, Dep. Var.: y2
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.028379   0.034692 -0.818031  0.413535    
#> x1           0.718648   0.034984 20.542000 < 2.2e-16 ***
#> x2           0.009986   0.033500  0.298072   0.76571    
#> x3           0.021206   0.035262  0.601372  0.547729    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,506.27   Adj. R2: 0.29582 
#> 
#> [[3]]
#> OLS estimation, Dep. Var.: y3
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.040832   0.034680 -1.177400  0.239316    
#> x1           0.689918   0.034972 19.728000 < 2.2e-16 ***
#> x2          -0.017889   0.033489 -0.534170  0.593343    
#> x3          -0.028022   0.035250 -0.794952  0.426831    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,505.94   Adj. R2: 0.28041 

Just one more note on macros: you can set them globally too with setFixest_fml. The following code would also have worked:

depvars <- list(~ y1, ~ y2, ~ y3)
setFixest_fml(..rhs = ~ x1 + x2 + x3)

lapply(depvars, function(i) {
    res <- feols(xpd(..lhs ~ ..rhs, ..lhs = i), data = dt)
    summary(res)
})

OK, now the very last note. When you use macros that need no redefinition you can avoid the use of xpd within fixest estimation functions. The following would work:

setFixest_fml(..lhs = ~ y1, ..rhs = ~ x1 + x2 + x3)
res <- feols(..lhs ~ ..rhs, data = dt)
Laurent Bergé
  • 1,292
  • 6
  • 8
2

I think this is because feols passes the unevaluated formula get(i) ~ x1 + x2 + x3 to fixest_env. By the time the unevaluated get(i) arrives in fixest_env it no longer has a referent in the calling environment, so it throws an error.

The way round this is to ensure that the formula contains the actual name of the variable you wish to use as the DV. You can do this by converting a character string to a formula:

lapply(depvars, function(i) {
   fml <- as.formula(paste(i, "~ x1 + x2 + x3"))
   res <- feols(fml, data = dt)
   summary(res)
 })

Which returns:

#> [[1]]
#> OLS estimation, Dep. Var.: y1
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.057391   0.036018 -1.593400  0.111389    
#> x1           0.350159   0.036080  9.705000 < 2.2e-16 ***
#> x2          -0.058296   0.036721 -1.587500  0.112711    
#> x3           0.012196   0.036110  0.337733  0.735635    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,545.80   Adj. R2: 0.0864 
#> 
#> [[2]]
#> OLS estimation, Dep. Var.: y2
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.013447   0.036563 -0.367778  0.713117    
#> x1           0.369671   0.036626 10.093000 < 2.2e-16 ***
#> x2           0.004896   0.037277  0.131345  0.895529    
#> x3           0.019210   0.036656  0.524056  0.600357    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,560.80   Adj. R2: 0.09126 
#> 
#> [[3]]
#> OLS estimation, Dep. Var.: y3
#> Observations: 1,000 
#> Standard-errors: Standard 
#>              Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) -0.020945   0.036181 -0.578896   0.56279    
#> x1           0.397365   0.036244 10.964000 < 2.2e-16 ***
#> x2          -0.057537   0.036887 -1.559800  0.119126    
#> x3           0.032324   0.036274  0.891114  0.373083    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -1,550.31   Adj. R2: 0.10881 
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87