-1

I have two very simple regressions:

foo <- lm(log(y*30) ~ x, data=myDt[y > 0])
bar <- lm(log(y*30) ~ x + d1 + d2, 
          data=myDt[y > 0])

where d1, d2 are factor variables with many different values and myDt is a data.table. Note that the linear model corresponding to foo is much lighter than bar. If anything, bar should be the one that takes time.

I can, without problems, run

stargazer(bar, type='text', 
          omit=c('d1', 'd2'), omit.labels=c('d1', 'd2')
          )

Running this takes perhaps 10 seconds. Running it with foo instead of bar is even faster. However, if I run it with both, I get stuck:

stargazer(foo, bar, type='text', 
          omit=c('d1', 'd2'), omit.labels=c('d1', 'd2')
          )

. After many minutes, I gave up on it. New try, one hour, still running. Second try: Done after 3h30 minutes:

=======================================================================
                                    Dependent variable:                
                    ---------------------------------------------------
                                    log(tentgelt * 30)                 
                              (1)                       (2)            
-----------------------------------------------------------------------
x                          -0.00001***                 0.00001          
                           (0.00000)                  (0.0001)         

Constant                    6.857***                  4.711***         
                            (0.017)                   (1.130)          

-----------------------------------------------------------------------
d1                             No                       Yes            
d2                             No                       Yes            
-----------------------------------------------------------------------
Observations                 4,858                     4,858           
R2                           0.002                     0.672           
Adjusted R2                  0.002                     0.137           
Residual Std. Error    1.160 (df = 4856)         1.078 (df = 1847)     
F Statistic         10.001*** (df = 1; 4856) 1.256*** (df = 3010; 1847)

Given that the linear model with the dummies run quite smoothly, I had expected the inclusion of the model without dummies not to cause such an issue. Is there any workaround to this?

Reproducible example

Alright, I managed to create some fake data that still demonstrates the behavior. Download as .csv from here, then run

myDt = as.data.table(read.csv('test.csv'))
myDt[, c('d1', 'd2'):=list(factor(d1),factor(d2))]
foo <- lm(log(y*30) ~ x, data=myDt[y > 0])
bar <- lm(log(y*30) ~ x + d1 + d2, 
          data=myDt[y > 0])
require(stargazer)
# this one should run quite quickly
stargazer(bar, type='text', omit=c('d1', 'd2'))
# this one takes forever.
stargazer(foo, bar, type='text', omit=c('d1', 'd2'))
FooBar
  • 15,724
  • 19
  • 82
  • 171
  • My guess is that you do not understand what this is doing: `data=myDt[y > 0]`. That creates a logical vector of some sort and chooses columns. (My guess is that you think it is choosing rows.) Voting to close as not reproducible. – IRTFM May 02 '15 at 01:38
  • @BondedDust: Indeed I believe it choses rows. My guess is that you did not catch that `myDt` here stands for `my data.table` or are unfamiliar with the latter. `data.table` is a handy data construct in `R` that extends `data.frame` and allows selecting rows via `myDt[condition] == myDt[condition,]`. It may relieve you to know that I checked it with `myDt[y>0,]` and the output does not change. Here's some introduction to `data.table`: http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf – FooBar May 02 '15 at 01:52
  • 2
    I am hardly "unfamiliar" with data.table, although your comment is the first time that you mention it. You should try to post a reproducible example (which would include loading non-base packages.) It's still not reproducible. – IRTFM May 02 '15 at 02:16
  • Yet another example of why failing to present a reproducible example .... leaves you with an unanswered question. (And why would you imagine that I was unfamiliar with data.table?) – IRTFM May 02 '15 at 06:22
  • @BondedDust I've seen sufficient questions in my days that had no reproducible example, yet received useful answers. Sometimes that's difficult ex-ante to expect. In my case, producing clear data that I can link in the public was not trivial. – FooBar May 02 '15 at 14:35

1 Answers1

0

I was in contact with the author. This is an indeed an issue in the package. I am to expect a fix in the next month or so.

FooBar
  • 15,724
  • 19
  • 82
  • 171