I have two very simple regressions:
foo <- lm(log(y*30) ~ x, data=myDt[y > 0])
bar <- lm(log(y*30) ~ x + d1 + d2,
data=myDt[y > 0])
where d1
, d2
are factor
variables with many different values and myDt
is a data.table
. Note that the linear model corresponding to foo
is much lighter than bar
. If anything, bar
should be the one that takes time.
I can, without problems, run
stargazer(bar, type='text',
omit=c('d1', 'd2'), omit.labels=c('d1', 'd2')
)
Running this takes perhaps 10 seconds. Running it with foo
instead of bar
is even faster. However, if I run it with both, I get stuck:
stargazer(foo, bar, type='text',
omit=c('d1', 'd2'), omit.labels=c('d1', 'd2')
)
. After many minutes, I gave up on it. New try, one hour, still running. Second try: Done after 3h30 minutes:
=======================================================================
Dependent variable:
---------------------------------------------------
log(tentgelt * 30)
(1) (2)
-----------------------------------------------------------------------
x -0.00001*** 0.00001
(0.00000) (0.0001)
Constant 6.857*** 4.711***
(0.017) (1.130)
-----------------------------------------------------------------------
d1 No Yes
d2 No Yes
-----------------------------------------------------------------------
Observations 4,858 4,858
R2 0.002 0.672
Adjusted R2 0.002 0.137
Residual Std. Error 1.160 (df = 4856) 1.078 (df = 1847)
F Statistic 10.001*** (df = 1; 4856) 1.256*** (df = 3010; 1847)
Given that the linear model with the dummies run quite smoothly, I had expected the inclusion of the model without dummies not to cause such an issue. Is there any workaround to this?
Reproducible example
Alright, I managed to create some fake data that still demonstrates the behavior. Download as .csv from here, then run
myDt = as.data.table(read.csv('test.csv'))
myDt[, c('d1', 'd2'):=list(factor(d1),factor(d2))]
foo <- lm(log(y*30) ~ x, data=myDt[y > 0])
bar <- lm(log(y*30) ~ x + d1 + d2,
data=myDt[y > 0])
require(stargazer)
# this one should run quite quickly
stargazer(bar, type='text', omit=c('d1', 'd2'))
# this one takes forever.
stargazer(foo, bar, type='text', omit=c('d1', 'd2'))