Split points by a factor in a plotly scatter plot

Question

Probably an easy one.

I have data points (with error bars) that I'd like to plot. There are two levels of grouping factors: group and cluster:

set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),group=c(rep("A",10),rep("B",10)),point=rnorm(20),err=runif(20,0.1,0.3))
df$group <- factor(df$group,levels=c("A","B"))

I'd like to plot the points where the x-axis is df$cluster, and within each cluster the points are color coded by df$group and split (so that group A point is left to group B point).

Here's what I'm trying:

library(plotly)
plot_ly(x=~df$cluster,y=~df$point,split=~df$group,type='scatter',mode="markers",showlegend=T,color=~df$group) %>%
  layout(legend=list(orientation="h",xanchor="center",x=0.5,y=1),xaxis=list(title=NA,zeroline=F,categoryorder="array",categoryarray=sort(unique(df$cluster)),showticklabels=T),yaxis=list(title="Val",zeroline=F)) %>%
  plotly::add_trace(error_y=list(array=df$err),showlegend=F)

Which gives me:

Pretty close but the only thing that's not working is splitting the points in each cluster by group.

Any idea how to get this to work? Ideally the code would be generic so that any number of group levels are split within each cluster, rather than a code that's specific to A and B of this example.

I'm not super well versed in plotly, but do you have a preference for `plot_ly` over `ggplotly`? — camille, Jun 23 '18 at 20:49
Yes definitely. Doing this with `ggplot2` is a piece of cake. — dan, Jun 23 '18 at 21:20
What I'm saying is there's `ggplotly` to translate a ggplot plot into a plotly one. But I don't know if there's a reason or preference to not use `ggplotly` for this — camille, Jun 23 '18 at 21:28
I know what you meant. `ggplotly` is a wrapper around `ggplot` to convert it to `plotly`. But it doesn't always work very so I'm not interested in that as a solution. — dan, Jun 23 '18 at 21:41

Matt Summersgill · Accepted Answer · 2018-06-25T18:24:37.403

I love plotly and use it almost exclusively, but there are some nice features built into ggplot2 that require a handful of tweaks to replicate in plotly.

Still, I think it's worth really getting to know some of the more detailed ins and outs if you plan on publishing interactive plots for others to review. The R API provides a enormous amount of control available to tweak and make every little detail perfect if you use the native syntax instead of ggplotly.

With that said, here's how I would tackle this problem:

(Data generation code provided in question)

library(plotly)    
set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),
                 group=c(rep("A",10),
                         rep("B",10)),
                 point=rnorm(20),
                 err=runif(20,0.1,0.3))    
df$group <- factor(df$group,levels=c("A","B"))

First, you need to do some manual "jittering" yourself in a systematic way. I haven't read the source code for the equivalent that does this "auto-magically" in ggplot2 functions, but I imagine something similar to this is taking place behind the curtain.

## Generate a set of offsets based on the number of group
Offset <- data.frame(group = unique(df$group),
                     offset = seq(-0.1, 0.1,length.out = length(unique(df$group))))

## Join the offset to the data frame based on group
df <- merge(df,Offset,by = "group", all.x = TRUE)

## Calculate an x location
df$x_location <- as.numeric(as.factor(df$cluster)) + df$offset

head(df) post-manipulation:

  group cluster      point       err offset x_location
1     A       A -0.6264538 0.2641893   -0.1        0.9
2     A       B  0.1836433 0.2294120   -0.1        1.9
3     A       C -0.8356286 0.2565866   -0.1        2.9
4     A       D  1.5952808 0.2106073   -0.1        3.9
5     A       E  0.3295078 0.2059439   -0.1        4.9
6     A       F -0.8204684 0.2578712   -0.1        5.9

Now that you have an explicit x_location, you can use that on a scatter plot and then add in categorical tick marks/text using an array. Then, by displaying the values of interest in the text, you can eliminate the x and y values from the hoverinfo to fully cover your tracks.

df %>% 
  plot_ly() %>% 
  add_trace(x= ~x_location,y= ~point, color= ~group,
            text = ~paste0("Group ",group," - Cluster ", cluster,"<br>",round(point,2)),
            error_y = list(type = "data", array = ~err), 
            hoverinfo = "text",
            type = "scatter", mode = "markers") %>%
  layout(hovermode = "compare",
         paper_bgcolor = 'rgba(235,235,235,0)',
         plot_bgcolor = "rgba(235,235,235,1)",
         legend=list(orientation="h",
                     xanchor="center",
                     yanchor = "bottom",
                     x=0.5,y=1,
                     bgcolor = "transparent"),
         xaxis=list(title=NA,
                    zeroline=FALSE,
                    tickmode = "array",
                    tickvals = unique(as.numeric(sort(as.factor(df$cluster)))),
                    ticktext = unique(sort(as.factor(df$cluster))),
                    gridcolor = "rgba(255,255,255,1)"),
         yaxis=list(title="Val",
                    zeroline=FALSE,
                    gridcolor = "rgba(255,255,255,1)"))

Thanks @Matt Summersgill. Too bad this is not implemented in `plotly`. — dan, Jun 26 '18 at 19:44

Mankind_008 · Answer 2 · 2018-06-23T22:39:30.570

I am not well acquainted with plotly package. I am unsure if this meets your requirement.

But here is a work around using unite from tidyr package. Created the cluster-group pair before using with plotly.

librray(tidyr)

df1 <-  df %>% unite(c('cluster','group'), col = 'clust_grp', sep = "-", remove = F)

plot_ly(df1, x=~clust_grp, y=~point, type='scatter', mode="markers", showlegend=T, color=~group) %>%
    layout(legend=list(orientation="h",xanchor="center",x=0.5,y=1),xaxis=list(title="cluster_group", 
           zeroline=F, categoryorder="array", categoryarray = sort(unique(df1$clust_grp)), showticklabels=T), yaxis=list(title="Val", zeroline=F)) %>%
               plotly::add_trace(error_y=list(array=df1$err), showlegend=F)

Thanks for the answer @Mankind_008. It's pretty close. However, I'd prefer that each cluster have a single x-axis location and that the groups are split across x-axis location. — dan, Jun 25 '18 at 16:55

Split points by a factor in a plotly scatter plot

2 Answers2