0

I am currently working with a big dataset (n>10 million). I found fixest package very helpful to run logit fixed-effects models fast (feglm).

f1 <- feglm(result ~ log(rate1) + 
sex + 
age + 
development + 
pop + 
acad + 
size | state, se= "standard",  family=c("logit"), lean =TRUE, mem.clean = TRUE, data=total)

The initial problem was that my models were too big. I've tried slimming them down with lean=TRUE and mem.clean=TRUE. I've also to wiped out the linear predictors and working residuals components of the model like so:

f1$linear.predictors <- NULL
f1$working_residuals <- NULL

By doing all these steps, I managed to trim A LOT of fat. The model was originally 1.2 GB but I managed to whip it down to ~200kb

print(object.size(f1), units = "auto")
218.3 Kb

Problems arise when I try to save the model as an rda file. The saving operation should take a split second. Instead, it takes minutes and saves the model as a bloated 300 MB file.

What am I doing wrong? I would like to keep the fixest object in its small 220 Kb size.

Thank you

YouLocalRUser
  • 309
  • 1
  • 9

2 Answers2

0

Set family, fml and fml_all to NULL and it should work.

The issue is that these items refer to environments even though they do not include the environments. However, when save is applied, the environments to which these values refer are also saved, leading to the large file size.

This is an issue I'll try to solve in the package. Btw the items linear.predictors and working_residuals will be appropriately erased in the next version of the package (0.10.2).

Laurent Bergé
  • 1,292
  • 6
  • 8
  • ``` f1$linear.predictors <- NULL f1$working_residuals <- NULL f1$family <- NULL f1$fml <- NULL f1$fml_all <- NULL ``` I implemented the suggestions. It works half-way. On the plus, saving is now fast. The file does not bloat in size. However, now I cannot longer run fixest package’s etable(f1) command. This is kinda of a bummer since I am building my tables in Latex. When I run command ``` etable(f1, tex= TRUE) ``` Error message ``` Error in if (family$family == "poisson" && family$link == "log") { : missing value where TRUE/FALSE needed ``` What do you? Thanks – YouLocalRUser Feb 18 '22 at 22:13
0

Thank you Laurent B. for pointing me to the right direction. I built on your suggestion and set a couple of the model's components to NULL.

f1.1$linear.predictors <- NULL
f1.1$working_residuals <- NULL
f1.1$family <- NULL
f1.1$fml <- NULL
f1.1$fml_all <- NULL
f1.1$family$family = "binomial"
f1.1$family$family = "logit"
f1.1[["fml"]][[2]] = "emigration"

Thank you for your help and for creating the wonderful package that fixest is.

YouLocalRUser
  • 309
  • 1
  • 9