1

I am currently working on panel data (using the plm package) with around 10 variables and only 30.000 observations. When I turn the option in plm() command from effect = "individual" to effect = "twoways", the calculation time takes ca. 1 minute, which I still find OK, and I got a plm object with a size of 12MB.

However, when I summarize the regression result with either summary(plm.object) or stargazer(plm.object), the process time can take up to 15 minutes with a CPU intensive work, which makes me really really confusing! From my understanding, the regression result is already stored in the plm.object after the regression, the only thing summary or stargazer does is show the result from the plm object.

Does anyone have an idea why summary() takes so long time?

Helix123
  • 3,502
  • 2
  • 16
  • 36
  • 1
    If I had to guess, I would say it's taking so long because [of this](https://github.com/cran/plm/blob/a8e5d802b36d3d78baa2f2e2a98c67519eb959da/R/plm.methods.R#L24) or [this](https://github.com/cran/plm/blob/a8e5d802b36d3d78baa2f2e2a98c67519eb959da/R/plm.methods.R#L25). – Roman Luštrik Aug 17 '17 at 06:04
  • Thanks, @RomanLuštrik. If that is the case, any idea I can turn it off. All I need is to see the regression results. – Lingyu Kong Aug 17 '17 at 06:36
  • First you could profile this function to see where the bottlenecks are. Without digging any further all I can recommend is to calculate summary results once, store them into an object and print that. – Roman Luštrik Aug 17 '17 at 06:38
  • Are you passing a function for the argument `vcov` to `summary()`? Non-default standard errors take some time to calculate albeit 15 mins sounds too long. Any chance to make the data set and your code for estimation available? – Helix123 Aug 17 '17 at 08:35
  • The estimated coefficients are stored in the plm object but not t-statistics, p-values, model statistics (those are stored in the object `summary()` generates). – Helix123 Aug 17 '17 at 08:40
  • 1
    You can run `pwaldtest(plm.object)` by hand to check @RomanLuštrik suggestion if the calculation of the F statistic take such a long time. (The other suggestion, `model <- describe(object, "model")` is really just a quick extraction of the model type estimated.) – Helix123 Aug 17 '17 at 09:15
  • Thanks, @Helix123! I guess it's the main reason if t-value and P-value were generated by the `summary()`. In addition, I tried your suggestion using `describe(object, "model")` but I got the error: `Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic`. – Lingyu Kong Aug 17 '17 at 09:55
  • So the run time of `pwaldtest(plm.object)` is "short"? Using `describe()` was not my suggestion, if you want to run it, you will need to use `plm::describe(plm.object)` as that function is not exported from package plm - still, this function does not much and thus won't be the cause for the long run time of `summary()`. – Helix123 Aug 17 '17 at 15:36
  • `pwaldtest` run time is very short within secs. – Lingyu Kong Aug 18 '17 at 01:53
  • The calculation of t-statistics and p-values should be fast as well... unless you supply a function to the argument `vcov`, then it could take a while... but you did not comment on my earlier comment about it. – Helix123 Aug 19 '17 at 08:29

0 Answers0