How do I store and re-use same sample from a fixest regression in a subsequent regression in r?

Question

How do I tell R to store the observations from a fepois regression as a data frame so I can use the same sample in a subsequent regression?

Here's a simple example of my problem:

library(fixest)
m <- fepois(y ~ x | id, data); n <- fepois(y ~ x, data)

I want both regressions to run on the same sample. In m certain observations will be dropped due to the id fixed effects. Therefore, I want to store the sub-sample of observation in m and use it to run n.

In Stata, I believe this is done with using e(sample) but I can't figure out how to do this in R for fixest regressions.

Allan Cameron · Answer 1 · 2022-08-11T11:50:45.627

The indices of the dropped observations are stored in m$fixef_removed, so if you pass data[-m$fixef_removed$id,] as the data argument in your second call, you will be running the two regressions on the same observations.

Here's a minimal example:

library(fixest)

# Sample data - note id 2 only has one observation
data <- data.frame(id = c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4),
                   x  = c(1, 2, 3, 2, 1, 2, 3, 1, 2, 3),
                   y  = c(1, 3, 2, 0, 5, 2, 6, 6, 13, 7))

m <- fepois(y ~ x | id, data)
#> NOTE: 1 fixed-effect (1 observation) removed because of only 0 outcomes.

n <- fepois(y ~ x, data[-m$fixef_removed$id,])

We can see both models have the same number of observations:

summary(m)
#> Poisson estimation, Dep. Var.: y
#> Observations: 9 
#> Fixed-effects: id: 3
#> Standard-errors: Clustered (id) 
#>   Estimate Std. Error t value Pr(>|t|)    
#> x 0.100167    0.04197 2.38664 0.017003 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -17.7   Adj. Pseudo R2: 0.165207
#>            BIC:  44.2     Squared Cor.: 0.642757

summary(n)
#> Poisson estimation, Dep. Var.: y
#> Observations: 9 
#> Standard-errors: IID 
#>             Estimate Std. Error  t value  Pr(>|t|)    
#> (Intercept) 1.321910   0.450951 2.931385 0.0033745 ** 
#> x           0.107349   0.202613 0.529822 0.5962356    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -27.6   Adj. Pseudo R2: -0.030282
#>            BIC:  59.5     Squared Cor.:  0.011293

^{Created on 2022-08-11 by the reprex package (v2.0.1)}

Thanks Allan. I mean that some people (my id variable) are only observed once across the time period & are dropped when I run the model with id fixed effects. I want the estimation sample to be the same if I run the regressions with and without the id fixed effects. E.g. `n` may originally have 200 observations but `m` only has 150 since it dropped people that only appear once in the sample. My goal is to run `n` using the same sample that is used in `m`. I can sub-sample and drop observations but that gets messy with my different specs so looking for something like e(sample) in Stata. — EnvEcon2022, Aug 11 '22 at 11:24
@EnvEcon2022 I see. In that case, the solution is to use `m` to filter your data - see my update. — Allan Cameron, Aug 11 '22 at 11:51

score 1 · Answer 2 · answered Aug 12 '22 at 07:09

You can easily do that with the function obs(model) which provides the observations used to estimate a model.

Here is an example:

base = iris
names(base) = c("y", "x1", "x2", "x3", "species")
base$species[1:5] = NA

# Estimation 1: obs. dropped due to NAness of FEs
est_1 = feols(y ~ x1 + x2 | species, base)
#> NOTE: 5 observations removed because of NA values (Fixed-effects: 5).

# obs() gets the observations used for estimation
head(obs(est_1))
#> [1]  6  7  8  9 10 11

# Estimation 2: no na but using same sample as Est 1 with subset
est_2 = feols(y ~ x1 + x2 + x3, base, subset = obs(est_1))

etable(est_1, est_2, fitstat = ~ n)
#>                            est_1               est_2
#> Dependent Var.:                y                   y
#>                                                     
#> x1               0.4274 (0.1615)  0.6520*** (0.0680)
#> x2              0.7774* (0.1260)  0.7097*** (0.0576)
#> Constant                           1.853*** (0.2564)
#> x3                               -0.5583*** (0.1294)
#> Fixed-Effects:  ---------------- -------------------
#> species                      Yes                  No
#> _______________ ________________ ___________________
#> S.E. type            by: species                 IID
#> Observations                 145                 145
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How do I store and re-use same sample from a fixest regression in a subsequent regression in r?

2 Answers2

Linked