3

I'm going to use the diamond data set that comes standard with the ggplot2 package to illustrate what I'm looking for.

I want to build a graph that is like this:

library(ggplot2)
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")

However, instead of having a count, I would like to return the mean of a continuous variable. I'd like to return cut and color and get the mean carat. If I put in this code:

ggplot(diamonds, aes(carat, fill=cut)) + geom_bar(position="dodge")

My output is a count of the number of carats vs the cut.

Anyone know how to do this?

black_sheep07
  • 2,308
  • 3
  • 26
  • 40

1 Answers1

1

You can get a new data frame with mean(carat) grouped by cut and color and then plot:

library(plyr)
data <- ddply(diamonds, .(cut, color), summarise, mean_carat = mean(carat))
ggplot(data, aes(color, mean_carat,fill=cut))+geom_bar(stat="identity", position="dodge")

enter image description here

If you want faster solutions you can use either dplyr or data.table

With dplyr:

library(dplyr)
data <- group_by(diamonds, cut, color)%.%summarise(mean_carat=mean(carat)) 

With data.table:

library(data.table)
data <- data.table(diamonds)[,list(mean_carat=mean(carat)), by=c('cut', 'color')]

The code for the plot is the same for both.

Carlos Cinelli
  • 11,354
  • 9
  • 43
  • 66
  • 1
    Awesome! I have coded the rest of my documents using data.table, so I was wondering whether there was a data.table solution to this as well. I'm also using sets of extremely large amounts of data, so the speed bonus is noticed. – black_sheep07 Mar 15 '14 at 02:13
  • Yes you can do it both with `dplyr` and `data.table` and they are faster then `plyr` I will post the solutions! – Carlos Cinelli Mar 15 '14 at 02:18
  • 1
    I love you so much I could kiss you. In an hour, you've solved my problems that I've slaved over for the last week and 3 different packages! – black_sheep07 Mar 15 '14 at 02:24