0

I have spent some time now learning reshape2 and plyr but I still do not get it. This time I have a problem with (a) subtotals and (b) passing different aggregate functions. Here an example using data from a tutorial on the blog of mrdwab

# libraries 
library(plyr)
library(reshape2)
# get data and add few more variables 
book.sales = read.csv("http://news.mrdwab.com/data-booksales")
book.sales$Stock = book.sales$Quantity + 10
book.sales$SubjCat[(book.sales$Subject == 'Economics') | 
  (book.sales$Subject == 'Management')] <- '1_EconSciences'
book.sales$SubjCat[book.sales$Subject %in% 
  c('Anthropology', 'Politics', 'Sociology')] <- '2_SocSciences'
book.sales$SubjCat[book.sales$Subject %in% c('Communication', 'Fiction',
  'History', 'Research', 'Statistics')] <- '3_other'

# to get to my starting dataframe (close to the project I am working on) 
book.sales1 <- ddply(book.sales, c('Region', 'Representative', 'SubjCat', 
                                   'Subject', 'Publisher'), summarize,
                 Stock = sum(Stock), Sold = sum(Quantity),
                 Ratio = round((100 * sum(Quantity)/sum(Stock)), digits = 1))


#melt it 
m.book.sales = melt(data = book.sales1, id.vars = c('Region', 'Representative',
                                        'SubjCat', 'Subject', 'Publisher'),
                    measured.vars = c('Stock', 'Sold', 'Ratio'))

# cast it --- # Please ignore this cast this was a mistake 
# Tab1 <- dcast(data = m.book.sales, 
#         formula = Region + Representative ~ Publisher + variable,
#         fun.aggregate = sum, margins = c('Region', 'Representative'))

Tab1 <- dcast(data = m.book.sales, formula = Region + Representative ~ 
  SubjCat + Subject + variable, fun.aggregate = sum,
              margins = c('Region', 'Representative', 'SubjCat', 'Subject'))

Now my questions :

  1. I have been able to add the subtotals in rows. But is it possible also to add margins in the columns. Say for example, Totals of Stock for one Publisher? Sorry I meant to say example total sold for all publishers.

  2. There is a problem with the columns with “ratio”. How can I get “mean” instead of “sum” for this variable ?

Please note: Question number one (about subtotals in margins) could be solved.

P.S.: I have seen some examples using reshape. Will you recommend to use it instead of reshape2 (which seems not to include the functionalities of two functions).

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
user1043144
  • 2,680
  • 5
  • 29
  • 45
  • It *sounds* to me like you're looking for `margins = TRUE`. Also, I'm not sure of your other question. It sounds like you're asking for "sum" of all the other variables and "mean" of just the "ratio" variable. Is this a correct interpretation? – A5C1D2H2I1M1N2O1R2T1 Sep 16 '12 at 10:22
  • yes. I would like to get the sum for all other variables and mean just for the ratio. Can this be done in a single step ? – user1043144 Sep 16 '12 at 10:57
  • The original data no longer seems to exist. Impossible to reproduce. – s_a May 14 '15 at 13:49

1 Answers1

2

Not sure exactly what you want for question 1, but if you want total of stock for Publisher would you not just do this?

 totalofstock <- ddply(book.sales, ('Publisher'), function(x)   
                      data.frame=c(subtotals  =  sum(x$Stock)))

and if you want to add it to Tab1 you just do this:

Tab1$bloomsburytotalofstock<-totalofstock[1,][[2]]
head(Tab1)

As for question 2 getting a mean instead of a sum surely you would be changing the function from sum to mean

e.g.

ratiomeans <- ddply(book.sales1, ('Publisher'), function(x)   
                      data.frame=c(ratioMEAN  =  mean(x$Ratio)))

Also I would suggest sticking with reshape2. reshape2 is basically the new version of reshape. As far as I know reshape is no longer being worked on but still exists so that people with old code using reshape do not have to rewrite everything.

EDIT

justratio<-(m.book.sales[m.book.sales$variable=="Ratio",])
Tab2 <- dcast(data = justratio, 
        formula = Region + Representative ~ SubjCat + Subject + variable,
        fun.aggregate = mean,
        margins = c('Region', 'Representative', 'SubjCat', 'Subject'))
final<-merge(Tab1,Tab2,by=c("Region","Representative"))
user1317221_G
  • 15,087
  • 3
  • 52
  • 78
  • sorry there was a mistake in the question. Your post made me realize this. For question 2, do you mean that I need to do 2 casts and merge them ? – user1043144 Sep 16 '12 at 09:43
  • seems like a simple wat to do it to me but note in my edit you will only need to do it to the subset of your data for `ratio` to save yourself energy – user1317221_G Sep 16 '12 at 10:28
  • 1
    Since we are already in `plyr` land here, `final <- join(Tab2, Tab1, by = c("Region", "Representative"))` might be a better candidate than `merge` since it will retain the original order of `Tab1`. As it is in this case, `merge` reverses the row order, making the final `data.frame` somewhat confusing to read. – A5C1D2H2I1M1N2O1R2T1 Sep 16 '12 at 10:35
  • @user1043144 mrdwab's suggestion will probably suit you well here. – user1317221_G Sep 16 '12 at 11:24
  • Thanks to both of you. I am afraid I have a long way to go in mastering "reshape" and "plyr". But good to know I can count on the community here. Thanks – user1043144 Sep 16 '12 at 12:05