-1

I'm looking to summarize data in two ways, but in one line of code. First, I want to get the mean/median/SD of a numeric variable by ID. Easy enough. I also want to get the mean/median/SD of a that same numeric variably by ID, but only for a subset of another variable. For example, I want to get the mean/median/SD of age by group if education is equal to 1.

Here's what I'm working with now:

DF.datatable<-data.table(DF)
setkey(DF.datatable, group)
new<-(DF.datatable[,list(mean=mean(age),median=median(age), sd=sd(age)),by=group])

As you can see, what I'm missing is the second component of the above. Setkey() creates a new file that only includes one row per group, so it's critical (and easier) that everything go in one code.

Any ideas?

  • Why does it have to be one line? Why not do `(DF.datatable[eduction==1,list(mean=mean(age),median=median(age), sd=sd(age)),by=group])` – Heroka Dec 14 '15 at 18:10
  • This would work, but I want to also get the mean/median/SD of age for all within the group, not just those with education==1. So I need what you have posted in addition to what I posted. – needavacation Dec 14 '15 at 18:24

1 Answers1

0

Try this:

DF.datatable[, .(mean(age), mean(ifelse(education == 1, age, NA), na.rm = T)), by = group]
Vadym B.
  • 681
  • 7
  • 21