0

Building off of my previous question, with thanks to @AndrewGB for the code modifications (Specifying fill color independent of mapping aesthetics in boxplot (R ggplot)), I have a dataset with 24 bars (individual categories with a status high/low).

Within my categories, I have a category type, which means I need to show only a subset of the legend keys (i.e., unique colours only). In the toy data I've provided, it would be akin to "Plant Type A" (pink) and "Plant Type B" (blue) with a "Control" (grey).

My intended output is to plot only the unique legend colours and then give these keys customizable labels.

library(ggplot2)
library(data.table)

dat <- as.data.table(cbind(iris, Status = rep(c("High", "Low"), 75)))
dat <- rbind(dat, data.frame(Petal.Width = sample(iris$Petal.Width, 30, replace = T),
      Species = "Control", 
      Status = "Control"), fill = T)

ggplot(dat, aes(x = Species,y = Petal.Width, fill = Status)) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  scale_fill_manual(values = c("red", "pink",
                               "red", "pink",
                               "blue", "slateblue", "grey"))

ggplot(dat, aes(x = Species, y = Petal.Width, fill = interaction(Status,Species))) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  scale_fill_manual(values = c("red", "pink",
                               "red", "pink",
                               "blue", "slateblue", "grey"))

The legends would then be:

Plant Type A: High Status (red)
Plant Type A: Low Status (pink)
Plant Type B: High Status (blue)
Plant Type B: Low Status (slateblue)
Control - no status (grey)

I've looked into the override.aes,guides and scale_fill_manual - breaks, but cannot seem to get this working without messing with the colours plotted. enter image description here

HarD
  • 183
  • 9

1 Answers1

1

You can use the breaks argument of scale_fill_manual to limit the number of legend entries without limiting the actual plotted colors. However, you need to name the colors in the values argument explicitly:

library(tidyverse)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose

dat <- as.data.table(cbind(iris, Status = rep(c("High", "Low"), 75)))
dat <- rbind(dat, data.frame(
  Petal.Width = sample(iris$Petal.Width, 30, replace = T),
  Species = "Control",
  Status = "Control"
), fill = T)

dat %>%
  mutate(fill = Species %>% paste0(Status)) %>%
  ggplot(aes(x = Species, y = Petal.Width, fill = fill)) +
  geom_boxplot() +
  scale_fill_manual(
    values = c(
      setosaHigh = "red", setosaLow = "pink",
      versicolorHigh = "lightgreen", versicolorLow = "darkgreen",
      virginicaHigh = "darkblue", virginicaLow = "lightblue",
      ControlControl = "purple"
    ),
    breaks = c("virginicaLow", "virginicaHigh", "ControlControl")
  )

Created on 2022-05-10 by the reprex package (v2.0.0)

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • @tjebo It's personal preference. I tend to avoid this, because one can forget to edit the corresponding element in the names vector. Using a named vector (Or any 2 columns of any table), one makes less errors in keeping the vectors to have the same length – danlooo May 10 '22 at 12:14