2

Suppose I have a data frame like this:

set.seed(123)
df <- as.data.frame(cbind(y<-sample(c("A","B","C"),10,T), X<-sample(c(1,2,3),10,T)))
df <- df[order(df$V1),]

Is there a simply function to sum (or any FUN) V2 by V1 and add to df as a new column, such that:

df$sum <- c(6,6,8,8,8,8,6,6,6,6)
df

I may write a function for that, but I have to do that frequently and be better to know the simplest way to realize that.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
David Z
  • 6,641
  • 11
  • 50
  • 101
  • 1
    `df<-as.data.frame(cbind(y<-sample(c("A","B","C"),10,T), X<-sample(c(1,2,3),10,T)))` burns my eyes, `df<-data.frame(y = sample(c("A","B","C"),10,T), X= sample(c(1,2,3),10,T))` is simpler (unless you really mean to assign `y` and `X` in the calling environment. – mnel Nov 21 '13 at 00:35
  • `cbind` will also coerce `X` to be a character vector, where you probably want 1-3 to be numeric values. – Scott Ritchie Nov 21 '13 at 01:22

1 Answers1

11

I agree with @mnel at least on his first point. I didn't see ave demonstrated in the answers he cited and I think it's the "simplest" base-R method. Using that data.frame(cbind( ...)) construction should be outlawed and teachers who demonstrate it should be stripped of their credentials.

set.seed(123)
 df<-data.frame(y=sample( c("A","B","C"), 10, T), 
                X=sample(c (1,2,3), 10, T))
  df<-df[order(df$y),]  # that step is not necessary for success.
df

 df$sum <- ave(df$X, df$y, FUN=sum)
 df
   y X sum
1  A 3   6
6  A 3   6
3  B 3   8
7  B 1   8
9  B 1   8
10 B 3   8
2  C 2   6
4  C 2   6
5  C 1   6
8  C 1   6
IRTFM
  • 258,963
  • 21
  • 364
  • 487