I've spent too much time trying to figure out a solution for including weighted.mean (or wtd.mean) into stat_summary and make it work properly. I've looked to several pages trying to tackle the same issue but none had a definitive solution. The main problem is that weighted.mean, once place in stat_summary, fails to find its weights component, which apparently can not be passed down from the ggplot and/or stat_summary aesthetics (believe me, I tried; see examples). Now, I tried various approaches and I've even produced a barplot of weighted means using a ddplyr based function (as suggested in another page) but, beside being a bit cluncky, it does not allow facetting, as it changes the source dataframe.
The following is dataframe built on purpose for this problem.
elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)
(A note: I've also added the "shape" column even if I will not use it, just to remind me that I don't want to lose any data in the process but it needs to be available at the end.)
The original setting was a ggplot just with "mean" which includes facetting, as in:
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
facet_wrap(~elements~existence)
The following is the corresponding code but with "weighted.mean" --> the "w" aestethics is ignored, therefore it assumes all the weights to be equal (by the weighted.mean function definition), which results in a simple mean
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
As you can see, it gives a warning Warning: Ignoring unknown aesthetics: w
I tried several ways to make it "see" the weight variable but with no success. In the end I realised that the most promising way would be to redefine the weight.mean function so that its default "w" would be a function of "x". Weighted.mean would still not see any "w" aeshetics but it would compute one as default. To achieve this I tried to nest the native function (weighted.mean) into a generic function, which allows me to change the arguments.
Step by step.
First I tried with "mean" (and it works).
mean.modif <- function(x) {
mean(x)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
Then with weighted.mean
weighted.mean.modif <- function(x,w) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
but it still doesn't read the "w" (as there's no "w" specified) so it gives back a normal mean.
Then I tried to specify the "w" argument as the weights column in the dataframe
weighted.mean.modif1 <- function(x,w=platonic$importance) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
but it doesn't work. A warnign message says:
Computation failed in stat_summary()
:
'x' and 'w' must have the same length
Being stuck, I tried to generate a random series of numbers but of the same length as "x" and it surprisingly worked.
weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
Obviously, there's a way to trick it but it's no use if I can use only random weights.
I tried to print "x" within the function and then applied it and, while it produces something, even "mean" doesn't work properly anymore.
mean.modif3 <- function(x) {
mean(x)
print(x)
}
So, the tricky part that I can not figure out is how to relate properly the "w" default to the "x" so that when the weighted.mean is called within stat_summary, not reading a "w", uses anyway the correct weights.
As I mentioned, there is also a ddply workaround to obtain a weighted mean plot - as it is based on creating a new source dataframe with just the variables already organised and the weighted means, but it does not allow facetting!!!
weighted.fictious <- function(xxxx, yyyy) {
ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}
ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
geom_bar(stat = "identity")
Thanks!