R is not Stata: It looks like this question is more a general question on how to report descriptive statistics in R.
There are many many ways to report desc stats. I provide an example below illustrating how to curate the desc stats to the estimation sample.
library(fixest)
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
# let's add a few missing values
base$x1[1:5] = NA
# estimation
est = feols(y ~ x1, base)
#> NOTE: 5 observations removed because of NA values (RHS: 5).
# There is an infinity of ways to get descriptive stats
# As an example, let's use the collapse pkg
library(collapse)
descr(base[obs(est), all.vars(est$fml)])
#> Dataset: all.vars(x$fml), 2 Variables, N = 145
#> -------------------------------------------------------
#> y (numeric):
#> Statistics
#> N Ndist Mean SD Min Max Skew Kurt
#> 145 35 5.88 0.82 4.3 7.9 0.27 2.46
#> Quantiles
#> 1% 5% 10% 25% 50% 75% 90% 95% 99%
#> 4.4 4.62 4.9 5.2 5.8 6.4 6.9 7.28 7.7
#> -------------------------------------------------------
#> x1 (numeric):
#> Statistics
#> N Ndist Mean SD Min Max Skew Kurt
#> 145 23 3.05 0.44 2 4.4 0.35 3.19
#> Quantiles
#> 1% 5% 10% 25% 50% 75% 90% 95% 99%
#> 2.2 2.32 2.5 2.8 3 3.3 3.66 3.8 4.16
#> ---------------------------------------------------------
Further, you can very easily automate that with a function, as below:
#Let's create a function that automates that
summ = function(x){
# x: fixest estimation
# we fetch the data
data = model.matrix(x, type = c("lhs", "rhs"))
# we remove the intercept
var_keep = names(data) != "(Intercept)"
data = data[var_keep]
# the sumstat
descr(data)
}
summ(est)
#> Dataset: data, 2 Variables, N = 145
#> -----------------------------------------------------
#> y (numeric):
#> Statistics
#> N Ndist Mean SD Min Max Skew Kurt
#> 145 35 5.88 0.82 4.3 7.9 0.27 2.46
#> Quantiles
#> 1% 5% 10% 25% 50% 75% 90% 95% 99%
#> 4.4 4.62 4.9 5.2 5.8 6.4 6.9 7.28 7.7
#> -----------------------------------------------------
#> x1 (numeric):
#> Statistics
#> N Ndist Mean SD Min Max Skew Kurt
#> 145 23 3.05 0.44 2 4.4 0.35 3.19
#> Quantiles
#> 1% 5% 10% 25% 50% 75% 90% 95% 99%
#> 2.2 2.32 2.5 2.8 3 3.3 3.66 3.8 4.16
#> ------------------------------------------------------
If you would prefer a table, you can easily adapt the code to work with datasummary:
library(modelsummary)
summ = function(x){
# x: fixest estimation
# we fetch the data
data = model.matrix(x, type = c("lhs", "rhs"))
# we remove the intercept
var_keep = names(data) != "(Intercept)"
data = data[var_keep]
# the sumstat
datasummary(All(data) ~ Mean + SD + Min + Median + Max, data, output = "dataframe")
}
summ(est)
#> Mean SD Min Median Max
#> 1 y 5.88 0.82 4.30 5.80 7.90
#> 2 x1 3.05 0.44 2.00 3.00 4.40