0

I have some simple data that shows call frequency for day of the week by hour. I used ggplot to make a heat table, and it works great. I was recently asked to add totals for both hour and DOW. It was messy, but I mostly have that figured out (the picture below only shows the hour totals, I didn't finish DOW yet). What I can't figure out is how to exclude the total rows from the color gradient fill. See picture below.

enter image description here

Here is my code:

ggplot(df, aes(DOW, Hour, fill=Freq)) + 
geom_tile(color="black") +
  geom_text(aes(label = Freq)) +
  ggtitle("Sector 1: Directed Calls by Hour & Day")+
  scale_fill_gradient2(low = "#FFF397", high = "#FF4545", mid="#FFBD42",  midpoint=mid1, na.value="white") +
  scale_x_discrete(drop=FALSE) +
  scale_y_discrete(drop=FALSE) +
  theme(panel.background = element_rect(fill="white"),
        axis.text=element_text(size=11),
        axis.title = element_blank(),
        legend.position = "none") +
  ggsave("SIAheat1.png",height=2, width=12, dpi=300)

Is there a way to exclude the "Total" row from the color scheme? And is there a way for ggplot to add the total row for me, instead of the roundabout numcolwise and sumifs I came up with?

r2evans
  • 141,215
  • 6
  • 77
  • 149
Abby
  • 43
  • 5
  • FYI, ` + ggsave(...)` is not how it was intended to be used, and for ggplot_3.3.4 and newer, it "should not work" due to fixes to unrelated bugs. See https://stackoverflow.com/a/68003758/3358272 – r2evans Aug 25 '23 at 18:41

1 Answers1

1

One way is to subset the data being used for the colored tiles.

Sample data:

n <- 500
tods <- c(sprintf("%02d", 0:23), "Total")
dows <- rev(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Total"))
set.seed(42)
df <- data.frame(dow = sample(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), size = n, replace = TRUE),
                 tod = sprintf("%02d", sample(0:23, size = n, replace = TRUE))) |>
  transform(
    dow = factor(dow, levels = dows),
    tod = factor(tod, levels = tods)
  )
counts <- as.data.frame(xtabs(~ dow + tod, df)) |>
  subset(dow != "Total" & tod != "Total")
dow_summ <- as.data.frame(xtabs(~ dow, df)) |>
  subset(dow != "Total") |>
  transform(tod = "Total")
tod_summ <- as.data.frame(xtabs(~ tod, df)) |>
  subset(tod != "Total") |>
  transform(dow = "Total")
all_summ <- data.frame(dow = "Total", tod = "Total", Freq = sum(counts$Freq))
counts$Freq_fill <- counts$Freq
dow_summ$Freq_fill <- NA
tod_summ$Freq_fill <- NA
alldata <- rbind(
  counts[, c("dow", "tod", "Freq")], dow_summ[, c("dow", "tod", "Freq")],
  tod_summ[, c("dow", "tod", "Freq")], all_summ[, c("dow", "tod", "Freq")])
head(alldata)
#   dow tod Freq
# 2 Sat  00    6
# 3 Fri  00    5
# 4 Thu  00    0
# 5 Wed  00    3
# 6 Tue  00    3
# 7 Mon  00    4

Plot:

ggplot(alldata, aes(tod, dow)) +
  labs(title = "Sector 1: Directed Calls by Hour & Day") +
  geom_tile(aes(fill = Freq), data = ~ subset(., tod != "Total" & dow != "Total" & Freq > 0)) +
  geom_tile(fill = "transparent", color = "black") +
  geom_text(aes(label = Freq)) +
  scale_fill_gradient2(low = "#FFF397", high = "#FF4545", mid="#FFBD42",  na.value="white") +
  theme(panel.background = element_rect(fill = "white"),
        axis.text = element_text(size = 11),
        axis.title = element_blank(),
        legend.position = "none") +
  scale_x_discrete(drop = FALSE) +
  scale_y_discrete(drop = FALSE)

ggplot with total rows not showing color in tiles

The second geom_tile is used to make sure that all squares get a black boundary; without it (and adding color="black" to the single geom_tile(.)) works mostly fine, except the edge 0s such as 00/Thu do not have a black boundary on the outer edge.


Edit: slight alteration in order to highlight the highest value cell(s): converting the axis labels to bold and italic, and putting a darker box around the cell(s). See ?plotmath for some options that would work here, or see ggtext documentation for other options.

# simple highlight function
HL <- function(x, val) {
  if (is.factor(x)) {
    lvls <- levels(x)
    x <- as.character(x)
  } else lvls <- unique(x)
  x <- sQuote(x, FALSE)
  val <- sQuote(val, FALSE)
  x[x %in% val] <- paste0("bolditalic(", x[x %in% val], ")")
  parse(text = x)
}

# extract row(s) with the max Freq
ismax <- counts[counts$Freq == max(counts$Freq),]

ggplot(alldata, aes(tod, dow)) +
  labs(title = "Sector 1: Directed Calls by Hour & Day") +
  geom_tile(aes(fill = Freq), data = ~ subset(., tod != "Total" & dow != "Total" & Freq > 0)) +
  geom_tile(fill = "transparent", color = "black") +
  geom_text(aes(label = Freq)) +
  scale_fill_gradient2(low = "#FFF397", high = "#FF4545", mid="#FFBD42",  na.value="white") +
  theme(panel.background = element_rect(fill = "white"),
        axis.text = element_text(size = 11),
        axis.title = element_blank(),
        legend.position = "none") +
  # ADD
  geom_tile(fill = "transparent", color = "black", linewidth = 1, data = ismax) +
  # CHANGE
  scale_x_discrete(drop = FALSE, labels = ~ HL(., ismax$tod)) +
  scale_y_discrete(drop = FALSE, labels = ~ HL(., ismax$dow))

same ggplot with highlights added

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you SO much! It worked perfectly and my code is so much cleaner now. Now I'm trying to figure out how to make the highest hour and highest DOW highlighted somehow, so if you have any more advice I'd super appreciate it. I've been trying, and I've learned a lot about ggplot2, but I've tried filling in the highest square or bolding the highest text and haven't gotten it yet. Thank you again! – Abby Aug 29 '23 at 16:01
  • 1
    Using _this_ sample data, do you mean highlighting `Sun` and `09` (because of the red 10)? – r2evans Aug 29 '23 at 16:13
  • 1
    @Abby, see my edit, providing two options (in the one plot) for highlighting the max cell(s). – r2evans Aug 29 '23 at 16:34