Remove outliers from a ggplotly() boxplot

Question

I have the dataframe below:

etf_id<-c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
factor<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
normalized<-c(-0.048436801,2.850578601,2.551666490,0.928625186,-0.638111793,
              -0.540615895,-0.501691539,-1.099239823,-0.040736139,-0.192048665,
              0.198915407,-0.092525810,0.214317734,2.550478998,0.024613778)
df<-data.frame(etf_id,factor,normalized)

and Im trying to remove outliers with 2 ways. First I try with outlier.color = NA,outlier.size = 0,outlier.shape = NA:

library(ggplot2)
library(plotly)
ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(df$normalized, c(0.01, 0.99), na.rm = T)))

Second example with diamonds dataset.

p<-ggplotly(diamonds %>% 
  ggplot(aes(cut,price, color = cut)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA))

Then I try with:

ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5, c(0.01, 0.99), na.rm = T)))

but this way seems to cut my plot y limits and I need a generic solution.

score 3 · Answer 1 · answered Feb 26 '20 at 04:23

We can go under the hood of ggplotly object and make outliers invisible. However note that hovering over the invisible outliers will still show hoverinfo of the outlier measurements.

p<-ggplotly(diamonds %>% 
            ggplot(aes(cut,price, color = cut)) +
            geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = 
NA))

for(i in 1:length(p)){
p$x$data[[i]]$marker$opacity = 0 
}

p

score 1 · Answer 2 · answered Dec 18 '19 at 02:43

1

I am not entirely sure what you are trying to do with the second approach. However, for what it's worth, the issue you are facing is rooted in this part of the code: boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5

Specifically, boxplot.stats(df$normalized)$stats returns this vector:

[1] -1.09923982 -0.34687010 -0.04073614  0.57147146  0.92862519

These are the boxplot stats (i.e. lower whisker, lower hinge, median, upper hinge, and upper whisker) for ALL of your data. But because the graph you are drawing is further subcategorizing the data by the factor variable, values from boxplot.stats for all of the data will not provide you with good boundaries.

Going back to your original problem of hiding outliers in boxplots: ggplotly does not honor the outlier.shape = NA argument you pass to ggplot. Instead, you should specifically hide the outliers in plotly. One solution can be found on plotly's GitHub issue tracker here.

answered Dec 18 '19 at 02:43

Merik

2,767
6
25
41

the other issue is that it suppresses every point, not only outliers points. – firmo23 Dec 18 '19 at 03:55
In the example you provided, there is only one point and that is an outlier point. Please update the example so I can understand what the issue is. – Merik Dec 19 '19 at 02:51
yes that one point (2.55) should be removed. the same will be applied to the othe 2 boxplots if they have outliers – firmo23 Dec 19 '19 at 17:26
I added another example with diamonds dataset – firmo23 Dec 19 '19 at 19:13

Remove outliers from a ggplotly() boxplot

2 Answers2