2

I have the following data frame in R which gives me the customers 1,2, and 3's transactional record. Each row shows the period type that the transaction has been made and the amount of money he spent.

id<-c(1,2,3,1,1,2,3,2,2)
period<-c("calib","valid","valid","calib","valid","valid","calib","calib","valid")
spent<-c(10,3,8,12,5,5,4,3,5)
df<-data.frame(id,period,spent)

now I need to create a new data frame which gives me the average 'spent' of each 'id' per transaction in different 'period's. The resulted table which I got in in excel pivot table should be as follow:

id  calib    valid
1     11       5
2     3        4.33
3     4        8

I know there should be a way to make this work in R but since I am new in R I'm not aware of it. I wonder if anyone can help me with this.

AliCivil
  • 2,003
  • 6
  • 28
  • 43

3 Answers3

3

You can do this using dcast from the reshape2 package (among numerous other ways, I'm sure):

library(reshape2)
dcast(df,id~period,fun.aggregate = mean)

  id calib    valid
1  1    11 5.000000
2  2     3 4.333333
3  3     4 8.000000

(Note that I'm assuming you intended to include the spent vector in your data frame.)

joran
  • 169,992
  • 32
  • 429
  • 468
  • thank you for your help. it works on my sample data frame but it is not still working on my main data frame. I think here since we just have 3 columns, it works but what if we have another column in the data frame; how can I specify that I need the id~period based on 'spent' average and not the other columns? – AliCivil Jul 22 '12 at 03:43
  • @AliTamaddoni If you run this code you'll see there's a warning that specifically addresses your concern and points you to the part of the documentation needed to fix it. (i.e. the `value.var` argument to `dcast`). – joran Jul 22 '12 at 03:48
  • actually im getting the warning message but I can't fix it – AliCivil Jul 22 '12 at 04:09
  • @AliTamaddoni What have you tried? I'm not sure that I, or the warning, could be any clearer. You select the column that contains the values to be aggregated with the `value.var` argument. By default, it guesses a column. Or you can set `value.var = "spent"`. – joran Jul 22 '12 at 04:12
3

Here's a base solution for completeness.

I edited your code, you forgot to include spent into your end data.frame.

df <- data.frame(id, period, spent)
with(df, tapply(spent, INDEX = list(id, period), FUN = mean))

  calib    valid
1    11 5.000000
2     3 4.333333
3     4 8.000000
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
2

And using the dplyr and tidyr packages:

df %>%
  group_by(id,period) %>%
  summarise (avg = mean(spent)) %>%
  spread(period,avg) # reshapes dataframe from long to wide
Sam Firke
  • 21,571
  • 9
  • 87
  • 105