0

As per the documentation of the ggpairs() function in the GGally R package, it is possible to specify custom functions as input to the "lower"/"upper" argument. For continuous-discrete variable combinations, I would like to simply display the means of the continuous variable within the categories of the categorical variable (preferably using dots, not bars), if possible further stratified by another categorical variable using a color aesthetic.

I found some information in the following thread:

https://github.com/ggobi/ggally/issues/218

However my knowledge of ggpairs (and ggplot2) is too superficial to be able to produce such a custom function from the template in the thread (also, the variable name "Species" appears to be hard-coded into the template and I would prefer to not have any hardcoded information in the custom function if at all possible).

I would be very grateful if somebody could help me out with a template or a sketch of a solution, e.g. using the following example (where "custom_function" would need to be replaced with the function described above):

dat <- reshape::tips
pm <- ggpairs(dat,
              mapping = aes(color = sex, alpha = 0.3),
              columns = c("total_bill", "smoker", "time", "tip"),
              showStrips = T,
              lower = list(combo = custom_function))
print(pm)
h_bauer
  • 33
  • 5
  • 1
    [This issue](https://github.com/ggobi/ggally/issues/139) has some good examples of writing custom functions. I don't think yours will be this complicated as you should be able to use `stat_summary` and `ggstance::stat_summaryh` if you only want group means per group. However, my guess is you'll need an `if` statement for numeric vs categorical y and the `eval` code in the link might be useful for that. – aosmith Aug 30 '17 at 16:48
  • @aosmith Thank a lot, I drafted a function based on your advice which seems to do the trick. However, I'm a bit confused about the labeling of the categorical variables. I'd be grateful for any advice. – h_bauer Sep 01 '17 at 09:19
  • 2
    Good job on figuring that all out; you should put that as an answer instead of editing your question. I agree it can be hard to read the axis labels all the way across the matrix; setting `axisLabels` to "internal" might help a little with that if you like the look. Another option would be to use `"box"` instead of the default `"box_no_facet"` for the upper triangle `combo`. – aosmith Sep 01 '17 at 13:29
  • Thanks, using "box" instead of "box_no_facet" worked fine! – h_bauer Sep 02 '17 at 14:25

1 Answers1

2

Based on the comment of @aosmith I made a custom function which seems to work well enough for my purposes, haven't extensively tested it so far, but maybe it is helpful anyway:

library(GGally)
library(ggplot2)
library(ggstance)

gmean_point <- function(data, mapping, ...) {

  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)

  if(is.numeric(y)) {
    p <- ggplot(data) +
      geom_blank(mapping) +
      stat_summary(mapping,
                   geom = 'point', fun.y = mean,
                   position = position_dodge(width = 0.2))
  } else {
    p <- ggplot(data) +
      geom_blank(mapping) +
      stat_summaryh(mapping,
                    geom = 'point', fun.x = mean,
                    position = position_dodgev(height = 0.2))
  }

  p

}

pm <- ggpairs(reshape::tips,
              mapping = aes(color = sex, alpha = 0.3),
              columns = c("total_bill", "smoker", "time", "tip"),
              showStrips = T,
              lower = list(combo = gmean_point),
              upper = list(combo = 'box'))
print(pm)

Plot produced by code above

h_bauer
  • 33
  • 5