1

I'm using fixest::feols() and I have a function I want to pass an argument to in order to subset the data using the subset = argument. However when keep getting the error: The argument 'subset' is a formula whose variables must be in the data set given in argument 'data'.

I have tried the following code:

library(fixest)

cars <- mtcars

my_fun <- function(data, hp.c.off) {
  
  feols(mpg ~ disp + drat,
        data = data,
        subset = ~ hp > substitute(hp.c.off))
}

my_fun(data = cars, 150)

My expected outcome would be the same as if one typed:

feols(mpg ~ disp + drat,
      data = cars,
      subset = ~ hp > 150)

I know I have to replace the value of hp.c.off before passing it onto a formula. And one could do this by creating a string expression first and then using as.formula() however, I was wondering if there is a better way to do programmatically build the expression that didn't require creating a string expression first and then converting it into a formula.

Thanks!

zephryl
  • 14,633
  • 3
  • 11
  • 30
cach1
  • 105
  • 8

3 Answers3

2

1) Create the formula as a character string and then convert it to a formula.

my_fun <- function(data, hp.c.off) {
  
  feols(mpg ~ disp + drat,
        data = data,
        subset = as.formula(paste("~ hp >", hp.c.off)))
}

2) or just don't use the subset= argument and instead use the data argument with subset.

my_fun <- function(data, hp.c.off) {
  
  feols(mpg ~ disp + drat,
        data = subset(data, hp > hp.c.off))
}

3) or use the fact that subset= can be a logical vector

my_fun <- function(data, hp.c.off) {

  feols(mpg ~ disp + drat,
        data = data, 
        subset = data$hp > hp.c.off)
}
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Tank you for you answer I hadn't thought about options 2 & 3. I will keep them in mind in future! – cach1 Nov 14 '22 at 21:09
1

Simple option is to pass an expression as argument to the function

my_fun <- function(data,expr = ~ hp > 150){
  
  feols(mpg ~ disp + drat,
        data = data,
        subset = expr)
}

-testing

> my_fun(data = cars)
OLS estimation, Dep. Var.: mpg
Observations: 13 
Standard-errors: IID 
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 23.414923   8.019808  2.919636 0.015310 *  
disp        -0.021349   0.008284 -2.577276 0.027545 *  
drat        -0.201284   2.014207 -0.099932 0.922373    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 2.16851   Adj. R2: 0.300667
akrun
  • 874,273
  • 37
  • 540
  • 662
  • You are correct, this would work however the real-world example is more complicated and requires the subsisting expression to be built inside the function (i.e. programmatically). I have updated OP to reflect this, tank you for your help! – cach1 Nov 13 '22 at 23:26
1

You can use rlang::new_formula(), with rlang::expr() to quote the rhs and !!rlang::enexpr() to capture and inject the hp.c.off argument.

I don’t have fixest installed, but this demonstrates building the formula inside a function:

library(rlang)

cars <- mtcars

my_fun <- function(data, hp.c.off) {
  new_formula(lhs = NULL, rhs = expr(hp > !!enexpr(hp.c.off)))
}

my_fun(data = cars, 150)
# ~hp > 150
# <environment: 0x1405e38>
zephryl
  • 14,633
  • 3
  • 11
  • 30