1

I am trying to summarize a UMAP scatter plot of single cell sequencing data with hexagons. As the goal is to simplify very busy clustering results, I am mixing colors for each bin (=hexagon) according to how many cells of each cluster are in the bin. In other words, if there are 2 cells from cluster 1 and 8 from cluster 2, I mix the colors assigned to those clusters in the proportions of the cells. This means I need to assign a specific color to each hexagon.

Please excuse the long code, I tried to shorten it as far as I could.

library(hexbin)
library(ggplot2)
library(tibble)

####################
# helper functions #
####################

#' Determines majority in a vector
#' @description 
#' Changed version of mclust::majorityVote. Ties are broken randomly.
#' 
#' @param x a vector
#' 
#' @returns 
#' A single element of x that has the highest count.
#' 
get_majority <- function(x){
  
  x <- as.vector(x)
  tally <- table(x)
  max_idx <- seq_along(tally)[tally == max(tally, na.rm = TRUE)]
  
  if(length(max_idx) > 1){
    max_idx <- sample(max_idx, size = 1)
  }
  
  majority <- names(tally)[max_idx]
  
  return(majority)
  
}



###################

# Toy data
umap_coords <- tibble( x = rnorm(1000),
                           y = rnorm(1000),
                           cluster = rep(c(1,2,3,4,5), 200))

colors <- c("#8DD3C7",
            "#FFFFB3",
            "#BEBADA",
            "#FB8072",
            "#80B1D3")
names(colors) <- 1:5

 
hexb <- hexbin::hexbin(umap_coords$x,
                     umap_coords$y,
                     xbins = 10,
                     xbnds = c(min(umap_coords$x),
                               max(umap_coords$x)),
                     ybnds = c(min(umap_coords$y),
                               max(umap_coords$y)),
                     IDs = TRUE)

gghex <- data.frame(hexbin::hcell2xy(hexb),
                      count = hexb@count,
                      cell = hexb@cell,
                      xo = hexb@xcm,
                      yo = hexb@ycm,
                      hexclust = NA)


for (i in seq_along(gghex$cell)){
    
    cell_id <- gghex$cell[i]
    hcnt <- gghex$count[i]
    
    orig_id <- which(hexb@cID == cell_id)
    umap_coords[orig_id,"hexbin"] <- cell_id
    
    gghex$hexclust[i] <- get_majority(umap_coords[orig_id, "cluster"])
    
  }
  
  
  hex_colors <- vector(mode = "character", length = length(gghex$cell))
 
  
  # For simplicity, here I assign a fixed color per cluster.
  for (n in seq_along(gghex$cell)){

    hex_colors[n] <- colors[names(colors) == gghex$hexclust[n]]
         
  }

  gghex$colors <- hex_colors


  # I define the data in the geom because I combine it with a scatterplot from a different data.frame.
  # (scatter plot is not relevatn for the question though.)
  p <- ggplot2::ggplot() + 
    ggplot2::geom_hex(data = gghex,
                      mapping = ggplot2::aes(x = x,
                                             y = y),
                      fill = gghex$colors,
                      alpha = 0.8,
                      stat = "identity")

p

wrong_fill

However, the resulting plot clearly does not assign the colors to the correct hexagons. If I plot the clusters by assigning it inside of aes() I get a different picture:

ggplot2::ggplot() + 
    ggplot2::geom_hex(data = gghex,
                      mapping = ggplot2::aes(x = x,
                                             y = y,
                                             fill = hexclust),
                      alpha = 0.8,
                      stat = "identity")

p

correct_automatic_fill

Now, for this particular toy problem I can just assign the colors via scale_fill_manual:

names(hex_colors) <- gghex$hexclust

ggplot2::ggplot() + 
    ggplot2::geom_hex(data = gghex,
                      mapping = ggplot2::aes(x = x,
                                             y = y,
                                             fill = hexclust),
                      alpha = 0.8,
                      stat = "identity") +
    scale_fill_manual(values = hex_colors, guide = "none")

toy_solution

But remember, in my actual proplem, I have to assign each hexagon a specific color. And here geom_hex seems to break down:

names(hex_colors) <- as.character(gghex$cell)

ggplot2::ggplot() + 
    ggplot2::geom_hex(data = gghex,
                                 mapping = ggplot2::aes(x = x,
                                                        y = y,
                                                        fill = as.character(cell)),
                                 alpha = 0.8,
                                 stat = "identity") +
    scale_fill_manual(values = hex_colors, guide = "none")

p

wrong_size

As you can see the size of the hexagons suddenly is completely wrong. I read a short suggestion by Hadley to use group = 1 in aes to make the hexagons aware of each other, but this does not work for me either.

Does anybody have a suggestion on how to get a working plot with geom_hex?

Thanks a lot!

EDIT: The answer by @Allen Cameron solves the question originally posed and I will mark it as the solution if there is no final answer to the edit.

However, I found that if I actually assign unique colors to the data, geom_hex once again produces hexagons of differing sizes:

library(hexbin)
library(ggplot2)
library(tibble)

####################
# helper functions #
####################

#' Determines majority in a vector
#' @description 
#' Changed version of mclust::majorityVote. Ties are broken randomly.
#' 
#' @param x a vector
#' 
#' @returns 
#' A single element of x that has the highest count.
#' 
get_majority <- function(x){
  
  x <- as.vector(x)
  tally <- table(x)
  max_idx <- seq_along(tally)[tally == max(tally, na.rm = TRUE)]
  
  if(length(max_idx) > 1){
    max_idx <- sample(max_idx, size = 1)
  }
  
  majority <- names(tally)[max_idx]
  
  return(majority)
  
}

#' Mixes the colors of two clusters proportionally.
#' 
#' @param df data.frame of cells with clusters in `color_by` and assigned 
#' hex bin in `hexbin`.
#' @param colors colors to be mixed.
#' @param cell Which hexbin to mix colors in.
#' @param color_by Column name where the clusters/groups are stored in `df`.
#' 
#' @returns 
#' Mixed color as hex code.
#' 
mix_rgb <- function(df, colors, cell, color_by){
  rgbcols <- col2rgb(colors)
  
  sel <- which(df$hexbin == cell)
  n_clust <- dplyr::pull(df[sel,color_by])
  n_clust <- table(as.character(n_clust))
  prop <- as.numeric(n_clust)
  names(prop) <- names(n_clust)
  prop <- prop/sum(prop)
  
  rgb_new <- sweep(rgbcols[,names(prop), drop=FALSE], MARGIN =2, FUN = "*", prop)  
  rgb_new <- rowSums(rgb_new)
  rgb_new <- rgb(red = rgb_new["red"], 
                 green = rgb_new["green"],
                 blue = rgb_new["blue"], 
                 maxColorValue = 255)
  return(rgb_new)
}



###################

umap_coords <- tibble( x = rnorm(1000),
                           y = rnorm(1000),
                           cluster = rep(c(1,2,3,4,5), 200))

colors <- c("#8DD3C7",
            "#FFFFB3",
            "#BEBADA",
            "#FB8072",
            "#80B1D3")
names(colors) <- 1:5

 
hexb <- hexbin::hexbin(umap_coords$x,
                     umap_coords$y,
                     xbins = 10,
                     xbnds = c(min(umap_coords$x),
                               max(umap_coords$x)),
                     ybnds = c(min(umap_coords$y),
                               max(umap_coords$y)),
                     IDs = TRUE)

gghex <- data.frame(hexbin::hcell2xy(hexb),
                      count = hexb@count,
                      cell = hexb@cell,
                      xo = hexb@xcm,
                      yo = hexb@ycm,
                      hexclust = NA)


for (i in seq_along(gghex$cell)){
    
    cell_id <- gghex$cell[i]
    hcnt <- gghex$count[i]
    
    orig_id <- which(hexb@cID == cell_id)
    umap_coords[orig_id,"hexbin"] <- cell_id
    
    gghex$hexclust[i] <- get_majority(umap_coords[orig_id, "cluster"])
    
  }
  
  
  hex_colors <- vector(mode = "character", length = length(gghex$cell))
 
  

  for (n in seq_along(gghex$cell)){

        hex_colors[n] <- mix_rgb(umap_coords,
                             colors = colors,
                             cell = gghex$cell[n],
                             color_by = "cluster")
  }

  gghex$colors <- hex_colors


ggplot2::ggplot() + 
  ggplot2::geom_hex(data = gghex,
                    mapping = ggplot2::aes(x = x,
                                           y = y,
                                           fill = colors),
                    alpha = 0.8,
                    stat = "identity") +
  scale_fill_identity()

The resulting plot looks as follows: enter image description here

YumTum
  • 389
  • 1
  • 3
  • 11

1 Answers1

2

If you wish to fill each hexagon according to the color column, you can use scale_fill_identity:

ggplot(gghex, aes(x, y, fill = colors)) + 
  geom_hex(stat = 'identity') +
  scale_fill_identity() 

enter image description here

We can see that all the colors are the desired ones and match the designated cluster by adding their cluster and color value as strings on the hexagons:

ggplot(gghex, aes(x, y, fill = colors)) + 
  geom_hex(stat = 'identity') +
  geom_text(aes(label = paste(colors, hexclust, sep = '\n')), size = 2.5) +
  scale_fill_identity()

enter image description here


Update

For the edited version of the data, this is where the group = 1 is needed:

ggplot(gghex, aes(x, y, fill = colors, group = 1)) + 
  geom_hex(stat = "identity") +
  scale_fill_identity()

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks a lot Allan! This seems to indeed solve the problem for the example data I provided. However, when working on some actual data it only works partially. Some hexagons seem to still be of the wrong size: https://i.imgur.com/VEFUDUu.png Would you have any idea why that is? – YumTum Jan 16 '23 at 15:48
  • @YumTum difficult to say without seeing the actual data I'm afraid. – Allan Cameron Jan 16 '23 at 15:51
  • I edited my original question, but if there is no solution to the update I will mark your answer as the correct one as it solves the original problem. – YumTum Jan 16 '23 at 16:17
  • Thanks for the update! Although I obviously don't have the exact random data you used, I expect that the updated plot is not correct. I have seen this stripe pattern many times before (see also first plot of my question), although I am not sure why it is happening. – YumTum Jan 16 '23 at 17:34
  • @YumTum this is a bug in ggplot which has been fixed in the latest development version. I didn't have the development version installed on my work PC. Now updated using `devtools::install_github("tidyverse/ggplot2", force= TRUE)` - this looks more like the kind of thing you are looking for. – Allan Cameron Jan 16 '23 at 17:49
  • Thanks so much for that, this indeed fixes the problem entirely now! This also explains why the previously functioning plot stopped working, I must have updated ggplot without being aware of it. Do you by any chance know a 'safe' stable previous version off ggplot one can use? – YumTum Jan 17 '23 at 10:12
  • I checked and ggplot2 <= 3.3.0 or devel version (3.4.0.9000) work. – YumTum Jan 17 '23 at 10:46