2

I'd like to create QQ-plots of a t-distribution using ggplot2's geom_qq() function. Hadley provides a nice example of how to do this here, but it's only for a single distribution. I wish to extend this to multiple groups with a facet and distribution for each group. I found a similar and related question here, but it doesn't really answer the question.

Passing either a list or vector of greater than length 1 does not seem to work.

library(ggplot2)

a <- 1:10
df <- data.frame(a = a, b = rt(1000, df = a))

deg_free <- 
  lapply(a, function(x) {
    return(MASS::fitdistr(subset(df, a == x)$b, 
                          "t")$estimate["df"])
  })

g <- 
  ggplot(data=df, aes(sample=b)) + 
  geom_qq(distribution = qt, dparams = deg_free) + 
  geom_qq_line(distribution = qt, dparams = deg_free) +
  facet_wrap(~a)

Does anyone know how to do this without resorting to computing quantiles for the data and manually plotting the QQ points and lines?

code_cowboy
  • 596
  • 5
  • 18

2 Answers2

3

For ggplot to take degrees of freedom into account in facets, the dataframe passed into ggplot() should contain that as a column:

library(dplyr)

set.seed(123) # for reproducibility
a <- 1:10
df <- data.frame(a = a, b = rt(1000, df = a))
deg_free <- 
  lapply(a, function(x) {
    return(MASS::fitdistr(subset(df, a == x)$b, 
                          "t")$estimate["df"])
  })

df <- df %>%
  left_join(data.frame(d = unlist(deg_free), a = a),
            by = "a")
rm(a, deg_free)

> head(df)
  a          b        d
1 1 -0.2624269 1.526920
2 2 -3.4784976 1.447293
3 3  1.6535141 2.819679
4 4  2.3848622 3.240377
5 5  0.4233105 3.946170
6 6  1.4423866 5.893569

With that out of the way, we can try to define modified versions of geom_qq / geom_qq_line that look for degrees of freedom df as a mapped aesthetic. Here's how the result can look like:

ggplot(df,
       aes(sample=b, df = d)) + 
  geom_qq2(distribution = qt) +
  geom_qq_line2(distribution = qt) +
  facet_wrap(~a, scales = "free")

plot

Code to create geom_qq2 / geom_qq_line2:

library(magrittr)
library(ggplot2)

# take reference from the compute_group functions for StatQq / StatQqLine
# but modify the code to include df in dparams, if it's a mapped aesthetic
compute_group_StatQq2 <- environment(StatQq$compute_group)$f
compute_group_StatQqLine2 <- environment(StatQqLine$compute_group)$f

body(compute_group_StatQq2) <- body(compute_group_StatQq2) %>% as.list() %>%
  append(quote(if("df" %in% colnames(data)) dparams <- append(dparams, list("df" = data$df[1]))),
         after = 1L) %>%
  as.call()

body(compute_group_StatQqLine2) <- body(compute_group_StatQqLine2) %>% as.list() %>%
  append(quote(if("df" %in% colnames(data)) dparams <- append(dparams, list("df" = data$df[1]))),
         after = 1L) %>%
  as.call()

# define modified ggproto classes
# which inherit from StatQq / StatQqLine, but use modified compute_group functions
StatQq2 <- ggproto("StatQq2", StatQq, compute_group = compute_group_StatQq2)
StatQqLine2 <- ggproto("StatQqLine2", StatQqLine, compute_group = compute_group_StatQqLine2)

# define modified geom functions
# which are based on geom_qq / geom_qq_line, but use Stat = modified Stat
geom_qq2 <- geom_qq
geom_qq_line2 <- geom_qq_line

body(geom_qq2) <- body(geom_qq) %>% as.list() %>%
  inset2(2, (.) %>% extract2(2) %>% as.list() %>%
           modifyList(val = list(stat = quote(StatQq2))) %>%
           as.call()) %>%
  as.call()

body(geom_qq_line2) <- body(geom_qq_line2) %>% as.list() %>%
  inset2(2, (.) %>% extract2(2) %>% as.list() %>%
           modifyList(val = list(stat = quote(StatQqLine2))) %>%
           as.call()) %>%
  as.call()

Code used to modify the body of a function took reference from MrFlick's answer to How to insert expression into the body of a function in R.

Disclaimer: I've never used geom_qq** before today. If I've missed out things while modifying the computation functions in StatQq**, let me know & I'll try to sort them out.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
2

I don't think geom_qq is set up to handle having different parameters per facet, so the way to do this might be to produce a plot separately for each subset of the data and combine them with something like cowplot::plot_grid:

library(tidyverse)

plots = df %>%
    group_by(a) %>%
    mutate(deg_free = MASS::fitdistr(b, "t")$estimate["df"]) %>%
    # This second group_by is just used to keep the deg_free value
    # in the final dataframe, could be removed
    group_by(a, deg_free) %>%
    do(
        plot = ggplot(data=., aes(sample=b)) + 
            geom_qq(distribution = qt, dparams = list(.$deg_free)) + 
            geom_qq_line(distribution = qt, dparams = list(.$deg_free)) +
            ggtitle(.$a)
    )

# Using map to unpack the list-column into a list, there's
#   probably a better way
cowplot::plot_grid(plotlist=map(plots$plot,  ~ .))

Example output:

Grid plot

Marius
  • 58,213
  • 16
  • 107
  • 105