2

So I am trying to calculate quantile regression and plot result in ggplot for some reason dummy variables cannot be shown for some reason when plotting result in ggplot

Code on example mtcars dataset if as folows:

library(dplyr)
library(ggplot2)
library(qre)
library(quantreg)
library(fastDummies)

dataset <- mtcars
dataset <- dummy_cols(dataset, select_columns = "gear")
dataset


rq(data=dataset,
   tau= 1:9/10,
   formula = hp ~  disp + mpg + qsec + gear_4  + gear_5) %>% 
  broom::tidy() %>% 
  #filter(term!="(Intercept)") %>%
  ggplot(aes(x=tau,y=estimate))+
  geom_point(color="#27408b", size = 3)+ 
  geom_ribbon(aes(ymin=conf.low,ymax=conf.high),alpha=0.25, fill="#27408b")+
  geom_line(color="#27408b", size = 1)+ 
  geom_smooth(method=  "lm", colour = "red", se = T)+  
  my_theme + 
  facet_wrap(~term,scales="free",ncol=2)

QR.2 <- rq(hp ~ disp + mpg + qsec + gear_4 + gear_5, data = dataset, tau = 1:9/10)
plot(summary(QR.2, se = "boot"))

When plotting result using plot(summary(QR.2, se = "boot")) everything work just fine, but for some reason using ggplot shows a mistake.

camille
  • 16,432
  • 18
  • 38
  • 60
Petr
  • 1,606
  • 2
  • 14
  • 39
  • 1
    What mistake? I'd have to install several of those packages to run your code. Are you sure that all those columns are in your data frame? – camille Dec 04 '19 at 20:57
  • I don't get an error when I run your code, other than for the missing `my_theme`, which I commented out. I wasn't able to install `qre` it's not on CRAN and I can't find such as package anywhere else), but it doesn't seem to be needed to run the ggplot code. – eipi10 Dec 04 '19 at 20:58
  • It does run, but when you look in ggplot coefs in `factor(gear4)` and `factor(gear5)` are empty, when plotting with basic plot here they are - that is why i posted question. – Petr Dec 04 '19 at 21:02

1 Answers1

2

ggplot is not plotting anything for the two gear columns because at least one quantile has confidence bounds that are essentially infinite. Notice in the output below that at least one quantile confidence limit is at R's maximum floating point value (1.797693e+308 or .Machine$double.xmax) for each level of gear.

rq(data=dataset,
   tau= 1:9/10,
   formula = hp ~  disp + mpg + qsec + gear_4  + gear_5) %>% 
  broom::tidy() %>% 
  filter(grepl("gear", term)) %>% 
  arrange(term) %>% 
  as.data.frame
     term   estimate       conf.low     conf.high tau
1  gear_4   7.725539  -1.070165e+02 1.797693e+308 0.1
2  gear_4  10.295479  -3.168527e+01  1.115851e+02 0.2
3  gear_4  26.146858  -2.800967e+01  4.627397e+01 0.3
4  gear_4  20.403808  -4.757591e+01  4.244444e+01 0.4
5  gear_4 -10.288388  -3.268460e+01  4.338169e+01 0.5
6  gear_4  -7.957834  -2.606588e+01  5.368260e+01 0.6
7  gear_4  -3.902589  -2.287453e+01  6.694520e+01 0.7
8  gear_4   5.087119  -1.295842e+02  9.044733e+01 0.8
9  gear_4   4.097664 -1.797693e+308  1.199334e+02 0.9
10 gear_5  13.464949 -1.797693e+308  1.610157e+02 0.1
11 gear_5  15.969431 -1.797693e+308  9.666875e+01 0.2
12 gear_5  74.974305  -4.802727e+01  1.006461e+02 0.3
13 gear_5  57.885205  -4.215393e+01  9.898391e+01 0.4
14 gear_5  27.118007  -2.715968e+01  9.400573e+01 0.5
15 gear_5  29.105166  -2.732280e+01  1.308410e+02 0.6
16 gear_5  29.568986  -2.064172e+01  1.461912e+02 0.7
17 gear_5  43.932664  -8.733431e+00 1.797693e+308 0.8
18 gear_5 113.512563   1.982236e+01 1.797693e+308 0.9

If you add to your graph, for example, + coord_cartesian(ylim=c(-100,200)), which forces a small range for the y-axis, you'll see that the values for each gear level appear in the graphs.

This is actually happening for gear_5 with the bootstrapped confidence intervals as well:

summary(QR.2) %>% 
  map_df(~ .x$coefficients %>% 
           as.data.frame %>% 
           rownames_to_column() %>% 
           mutate(tau = .x$tau)) %>% 
  filter(grepl("gear", rowname)) %>%
  arrange(term)
   rowname coefficients       lower bd      upper bd tau
1   gear_4     7.725539  -3.586683e+01  1.234368e+02 0.1
2   gear_4    10.295479  -1.596869e+01  3.471653e+01 0.2
3   gear_4    26.146858  -2.754952e+01  4.115246e+01 0.3
4   gear_4    20.403808  -2.798206e+01  4.230933e+01 0.4
5   gear_4   -10.288388  -2.440144e+01  4.221202e+01 0.5
6   gear_4    -7.957834  -2.069021e+01  4.198471e+01 0.6
7   gear_4    -3.902589  -1.967830e+01  2.471069e+01 0.7
8   gear_4     5.087119  -9.981679e+01  6.940221e+01 0.8
9   gear_4     4.097664  -2.771170e+02  1.193569e+02 0.9
10  gear_5    13.464949 -1.797693e+308  8.781396e+01 0.1
11  gear_5    15.969431  -1.617983e+01  9.571634e+01 0.2
12  gear_5    74.974305  -4.687321e+01  1.006256e+02 0.3
13  gear_5    57.885205  -2.904531e+01  9.697743e+01 0.4
14  gear_5    27.118007  -2.375592e+01  9.330493e+01 0.5
15  gear_5    29.105166  -2.523643e+01  9.464342e+01 0.6
16  gear_5    29.568986  -5.958817e+00  1.439989e+02 0.7
17  gear_5    43.932664   8.020035e+00  1.293194e+02 0.8
18  gear_5   113.512563   2.597030e+01 1.797693e+308 0.9

The plot method for plotting summary.rqs objects must be doing some other sorts of processing or trimming on the confidence bands, or maybe it's plotting something different from the confidence bands. Either way, what it's plotting is not the full range of the confidence bands produced in the output of summary(QR.2).

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Great answer, thanks. In a similar situation I manages to make it work by manually "trimming" the problematic edges, e.g. changing tau from 0.1-0.9 to 0.2-0.8 – arielhasidim Jun 01 '21 at 06:56