3

I want to create a boxplot (and potentially other plots) with logarithmic scale AND statistics calculated on logged data. The following example shows the logic.

The data looks like:

d1 <- data.frame(x = rchisq(1000, 2), mod = c(rep('a', 500), rep('b', 500)))

If I pre-transform data, I obtain the plot with the value of logs on y-axis.

plot_ly(d1, y = ~log10(x), color = ~mod, type = 'box')

enter image description here

If I transform y-axis after creating boxplot, I get a boxplot with the length of whiskers and median from the original data and the original data in log scale on y-axis.

plot_ly(d1, y = ~x, color = ~mod, type = 'box') %>%
  layout(yaxis = list(type = "log", showgrid=T,  ticks="outside", autorange=TRUE))

enter image description here

My desirable outcome is a combination of the 2 plots above - the boxplot from the first picture and the scale from the second one. It should look like something that is possible to do in ggplot:

d1 %>% ggplot(aes(y=x, alpha = 0.1, color = mod, fill = mod))+
  geom_boxplot()+
  scale_y_log10()

enter image description here

I tried to use ggploly to modify ggplot into plotly, but it loses the scale and changes it to the scale from the first picture.

Can anybody help with making such a plot in plotly or how to preserve the log-scale on y-axis with ggplotly?

Kirill K.
  • 43
  • 3

1 Answers1

1

It is harder to maintain the log scale when you transform a ggplot object to ggplotly - for box plot. You were very close in plot_ly. Just add tickvals=c(0.001,0.01,0.1,1,10) to the layout of your plot_ly and you will get the following output:

output

You can figure out the range of your data and define appropriately as a vector mytickvals<-c(1,10,100), and then in layout assign it as tickvals=mytickvals.

YBS
  • 19,324
  • 2
  • 9
  • 27
  • Well, it does not solve the problem - the boxplot itself (I mean outliers, quantiles, etc.) are the same as at the second plot at the original post, while they needed to be as at the first one. – Kirill K. Jun 20 '20 at 14:02
  • As you have negative values for y in first plot, it will be difficult to display a log scale for y-axis. Also, please note the issue of transforming variables prior to summarizing as noted in https://coolbutuseless.github.io/2018/08/06/ggplot2-stat_summary-problem/. – YBS Jun 21 '20 at 00:54