2

In a dataframe I have a vector with some values, and vectors of categories that each value belongs to. I want to apply a function to the values, that operates "by category", so I use tapply. For example, in my case I want to rescale the values within each category.

However, the result of tapply is a list of vectors of the rescaled values, but I need to unify (or "linearize" back) this list, so I can add a column of the rescaled values to my data frame.

I'm looking for a simple way to do that. here is a sample:

x = 1:10
c = factor(c(1,2,1,2,1,2,1,2,1,2))
#I do the rescaling like this:
rescaled = tapply(x,list(c),function(x) as.vector(scale(x)))
# this look like this:
$`1`
[1] -1.2649111 -0.6324555  0.0000000  0.6324555  1.2649111

$`2`
[1] -1.2649111 -0.6324555  0.0000000  0.6324555  1.2649111


# but really, I need to get something like this
[1] -1.2649111 -1.2649111 -0.6324555 -0.6324555  0.0000000  0.0000000
 [7]  0.6324555  0.6324555  1.2649111  1.2649111

Any suggestions?

thanks, amit

smci
  • 32,567
  • 20
  • 113
  • 146
amit
  • 3,332
  • 6
  • 24
  • 32
  • Related question, also on unpacking vectors/arrays(/lists) column-wise: http://stackoverflow.com/questions/22558395/unwanted-sorted-behavior-on-result-of-vector-concatenation-function – smci May 13 '14 at 18:41

1 Answers1

7

Another job for the workhorse ave. Let me illustrate it with a data frame:

> mydf <- data.frame(x=1:10,myfac=factor(c(1,2,1,2,1,2,1,2,1,2)))
> within(mydf, scaledx <- ave(x,myfac,FUN=scale))
    x myfac    scaledx
1   1     1 -1.2649111
2   2     2 -1.2649111
3   3     1 -0.6324555
4   4     2 -0.6324555
5   5     1  0.0000000
6   6     2  0.0000000
7   7     1  0.6324555
8   8     2  0.6324555
9   9     1  1.2649111
10 10     2  1.2649111

If you look at ?ave, it tells you that you can also use a list of factors to do this. If you want to add a column to a dataframe, this is your most concise (albeit not the fastest) bet. In combination with within you can do both operations in a single line of code.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • 1
    Might be worth mentioning `unlist`, just in case, although I'm not clear on how important the ordering is... – joran Feb 05 '13 at 15:57
  • @joran `unlist` will give you a wrong ordening, as it pastes both elements in the list behind eachother. It requires a lot of index magic to get them in the order suited for the dataframe, especially when we're talking different factors. – Joris Meys Feb 05 '13 at 15:59
  • Yeah, that's why I said I wasn't sure from the question whether the ordering was an important piece, but clearly you think so! :) – joran Feb 05 '13 at 16:03
  • in fact, i did something with unlist to "reunite" the tapply outcome. it's ugly! but here it is anyway: ids = tapply(1:length(x),c,function(x) x) rescaled = tapply(x,c,function(x) as.vector(scale(x))) y = numeric(length(x)) y[unlist(ids)] = unlist(rescaled) works, but really really ugly. – amit Feb 05 '13 at 16:16