1

I've got the output of some simulations that look something like this:

Run,ID,Time,Var1,Outcome
1,1,6,0.5,1
1,2,4,0.25,1
1,3,2,0.9,1
2,1,5,0.07,1
...
10,3,9,0.08,1

Basically a series of M studies of N individuals (in actuality M = 1000 and N = 123). I'd like to run a Cox model (preferably) or a parametric regression model (if I must) to estimate the effect of Var1 on survival time. What I want to do is estimate the effect for each "Run" (to produce 1,000 estimates) and then dump all those estimates into a single data frame, matrix, etc. where I can look at their distribution.

If I were using SAS, the code would look something like this:

ods output ParameterEstimates=work.parameters;
proc phreg model time*outcome(0) = Var1;
   BY Run;
run;
ods output close;

But since this is a side project, and I'm trying to force myself to do side projects in R in order to learn it, I can't so much fall back on SAS. As far as I can tell from the coxph() documentation, there's no easy way to include a by-variable. My guess is this is going to involve loops and subsets.

Any suggestions?

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Fomite
  • 2,213
  • 7
  • 30
  • 46
  • It's not clear to me what you're after (this may be due to my ignorance with time series) but replicate may be what you're after. – Tyler Rinker Jul 15 '12 at 23:42
  • I would suggest using `plyr::ddply` or `data.table` as opposed to a loop or subset. – mnel Jul 15 '12 at 23:43
  • @TylerRinker The same question could apply to something like lm(). Essentially, how take a data set and apply something like lm() on levels of that variable. – Fomite Jul 15 '12 at 23:48
  • 1
    @EpiGrad then mnel's reponse is good as would the use of base's `split` and then `lapply`. You'd then have to grab the pices from lapplying the cox model and use `do.call` to put them back together as a dataframe. – Tyler Rinker Jul 15 '12 at 23:59
  • There's `nlme::lmList` to be used for `lm`. – Roman Luštrik Jul 18 '12 at 09:35

1 Answers1

4

An example using plyr or data.table

## some data
set.seed(123)
.data <- data.frame(run = rep(1:10, each = 50), x = runif(500))
.data$y <- .data$x * rep(runif(10),each = 50)

# ---------------------------------------------------------
# using plyr
library(plyr)
# ddply to extract just the coefficients
ddply(.data, .(run), function(data) data.frame(coef = coef(lm(y ~ x, data))))
    # or save the whole object
# the whole lm object 
lm_list <- dlply(.data, .(run), lm, formula = y ~ x)
# get the coefficients    
ldply(lm_list, coef)
# print the summaries
llply(lm_list, summary)

# ---------------------------------------------------------
# with data.table 
library(data.table)

DT <- data.table(.data)
setkeyv(DT, 'run')

DT[, list(coef = coef(lm(y~x, .SD))), by = 'run']
mnel
  • 113,303
  • 27
  • 265
  • 254