15

The data.table package is very helpful in terms of speed. But I am having trouble actually using the output from a linear regression. Is there an easy way to get the data.table output to be as pretty/useful as that from the plyr package? Below is an example. Thank you!

library('data.table');
library('plyr');

REG <- data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;

ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));

REG[, coef(lm(y ~ x + z)), by=ID];

The data.table coefficient estimates are output in a single column whereas the plyr/ddply coefficient estimates are output in multiple and nicely labeled columns.

I know I can run the regression three times with data.table but that seems really inefficient. I could be wrong, though.

REG[, Intercept=coef(lm(y ~ x + z))[1],
      x        =coef(lm(y ~ x + z))[2],
      z        =coef(lm(y ~ x + z))[3], by=ID];
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
user1491868
  • 596
  • 4
  • 15
  • 42

1 Answers1

14

Try this:

> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
        ID (Intercept)           x         z
[1,] Frank  -0.2928611  0.07215896  1.835106
[2,]  Tony   0.9120795 -1.11153056  2.041260
[3,]    Ed   1.0498359  5.77131778 -1.253741

I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.

Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...)) did not succeed in the manner we hoped.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    There was [this](http://stackoverflow.com/questions/11233183/grouping-in-data-table-how-to-get-more-than-1-column-of-results/11233262#11233262) from yesterday (see esp. the second comment to my answer) but it's nice to have this demo'd more prominently. – Josh O'Brien Jun 29 '12 at 18:42
  • 1
    And notice that `list()` is not the answer. – IRTFM Jun 29 '12 at 18:47
  • 1
    That's why I referenced the comment ;) (Just trying to show from you where you might have gotten that nagging feeling.) – Josh O'Brien Jun 29 '12 at 18:48
  • 2
    Just to clarify, the problem with `list()` is that it returns a *one-element list* containing a length-three vector, rather than a *three-element list*, each element of which is a length-one vector (which is what we need if we want data.table to put the results in three different columns). – Josh O'Brien Jun 29 '12 at 18:57