0

So I would like to plot entire full range quantile lines in full range when using facet_wrap. The code goes as follows:

library(tidyverse)
library(quantreg)

mtcars %>% 
  gather("variable", "value", -c(3, 10)) %>% 
  ggplot(aes(value, disp)) + 
  geom_point(aes(color = factor(gear))) + 
  geom_quantile(quantiles = 0.5, 
                aes(group = factor(gear), color = factor(gear))) +
  facet_wrap(~variable, scales = "free")
#> [multiple warnings removed for clarity]

Created on 2019-12-05 by the reprex package (v0.3.0)

As can be seen regression lines don't have full range and I cannot solve this easily.

markus
  • 25,843
  • 5
  • 39
  • 58
Petr
  • 1,606
  • 2
  • 14
  • 39
  • Related post: https://stackoverflow.com/questions/59184868/geom-quantile-full-range-in-ggplot2 – markus Dec 05 '19 at 11:55
  • One benefit to `ggplot` not extending the fitted lines by default, is that it clearly shows the extent of the (x-axis) data that the model was trained on. This can help highlight potential data gaps. Anything beyond the lines is extrapolation (which may or may not be fine). – kakarot Jan 07 '22 at 00:51

1 Answers1

1

This feels over-engineered, but one approach would be to get the slope-intercept figures outside of ggplot and then plot them using geom_abline. A potential downside of this implementation is that it uses some jittering to prevent a "singular design matrix" error in rq, but this means that it would generate random slopes even for data with only one x value. To get around that, there's a step here to remove data from the slop calculation if it only has one value for that variable-gear combination.

mtcars %>% 
  gather("variable", "value", -c(3, 10)) -> mt_tidy

mt_tidy %>%
  # EDIT: Added section to remove data that only has one value for that
  #   variable and gear. 
  group_by(variable, gear) %>%
    mutate(distinct_values = n_distinct(value)) %>% 
    ungroup() %>%
    filter(distinct_values > 1) %>%
    select(-distinct_values) %>%

  nest_legacy(-c(variable, gear)) %>% 
  # the jittering here avoids the "Singular design matrix" error
  mutate(qtile = map(data, ~ rq(jitter(.x$disp) ~ jitter(.x$value), 
                                tau = 0.5)),
         tidied = map(qtile, broom::tidy)) %>%
  unnest_legacy(tidied) %>%
  select(gear:estimate) %>%
  pivot_wider(names_from = term, values_from = estimate) %>%
  select(gear, variable, 
         intercept = `(Intercept)`, 
         slope = `jitter(.x$value)`) -> qtl_lines

ggplot(mt_tidy, aes(value, disp, color = factor(gear))) + 
  geom_point() + 
  geom_abline(data = qtl_lines,
              aes(intercept = intercept, slope = slope,
                  color = factor(gear))) +
  facet_wrap(~variable, scales = "free")

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Hi, thank you for your answer this approach seems to be pretty good, however It can be seen that eg. in am variable there is a random slope for factor 3, how would we solve it, so that the random slopes would not be here - One again thank you very much, I will try to play around with it a bit... – Petr Dec 08 '19 at 10:17
  • 1
    Edited to add a step to exclude data with only one value for that variable-gear combo. – Jon Spring Dec 08 '19 at 18:05