4

I have a set of user recommandations

review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

and wanted to use summary(review) to show basic properties mean, median, quartiles and min max.

But it gives back the summary of both columns. I refrain from using data.frame because the factors 'Star' are ordered. How can I tell R that Star is a ordered list of factors numeric score and votes are their frequency?

Roland Kofler
  • 1,332
  • 1
  • 16
  • 33
  • I just saw the tag frequency-analysis. Are you looking for table()? Or contingency tables? – Matt Bannert Feb 05 '11 at 13:34
  • i tried table didn't work. I somhow need to do mean, median, quartiles. and i dont want to do it by hand. thats a minimum I expect from a statistical framework – Roland Kofler Feb 05 '11 at 13:51
  • 3
    Note that the weighted mean of an _ordered factor_ is not defined, because the whole point of not calling it numeric is that the between-level intervals are not known. You have to assign numeric scores to take means. – Aniko Feb 05 '11 at 15:37
  • Thanks Aniko, I understand your argument. To name the beast. I want to do it with Amazon reviews. I think they see ratings as cardinal since they do compute the mean – Roland Kofler Feb 05 '11 at 15:40
  • What you are really looking for is the mean, and quantiles of the Stars *weighted* by the Votes. And Al3xa's approach is a good way to get this. There is really no need to make the Star column a factor. – Prasad Chalasani Feb 05 '11 at 15:56

3 Answers3

5

I'm not exactly sure what you mean by taking the mean in general if Star is supposed to be an ordered factor. However, in the example you give where Star is actually a set of numeric values, you can use the following:

library(Hmisc)

R> review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))

R> wtd.mean(review[, 1], weights = review[, 2])
[1] 4.0625

R> wtd.quantile(review[, 1], weights = review[, 2])
  0%  25%  50%  75% 100% 
1.00 3.75 5.00 5.00 5.00 
brentonk
  • 1,308
  • 1
  • 13
  • 14
2

I don't understand what's the problem. Why shouldn't you use data.frame?

rv <- data.frame(star = ordered(review[, 1]), votes = review[, 2])

You should convert your data.frame to vector:

( vts <- with(rv, rep(star, votes)) )
 [1] 5 5 5 5 5 5 5 5 5 5 4 4 3 2 1 1
Levels: 1 < 2 < 3 < 4 < 5

Then do the summary... I just don't know what kind of summary, since summary will bring you back to the start. O_o

summary(vts)
 1  2  3  4  5 
 2  1  1  2 10 

EDIT (on @Prasad's suggestion)

Since vts is an ordered factor, you should convert it to numeric, hence calculate the summary (at this moment I will disregard the background statistical issues):

nvts <- as.numeric(levels(vts)[vts])  ## numeric conversion
summary(nvts)  ## "ordinary" summary
fivenum(nvts)  ## Tukey's five number summary
aL3xa
  • 35,415
  • 18
  • 79
  • 112
0

Just to clarify -- when you say you would like "mean, median, quartiles and min/max", you're talking in terms of number of stars? e.g mean = 4.062 stars? Then using aL3xa's code, would something like summary(as.numeric(as.character(vts))) be what you want?

crayola
  • 1,668
  • 13
  • 16