4

I have categorical data that I'd like to map the frequency of using a heatmap (geom_tile), much like the example below:

data("mtcars")
freq <- data.frame(xtabs(~cyl + gear, mtcars)) #count number of 4,6,8 cyl cars by gear
ggplot(freq, aes(cyl, gear)) +
  geom_tile(aes(fill = Freq)) + 
  scale_fill_gradient(low = "white",high = "steelblue")

standard frequency count heatmap

But I'd like to split each tile according to the proportion of significant or non-significant results (0-1 values). In this example, I would generate the same frequency count but differentiate between automatic and manual transmission (am)

freq_am <- data.frame(xtabs(~cyl + gear + am, mtcars))
print(freq_am)
   #cyl gear am Freq
      4    3  0    1
      6    3  0    2
      8    3  0   12
      4    4  0    2
      6    4  0    2
      8    4  0    0
      4    5  0    0
      6    5  0    0
      8    5  0    0
      4    3  1    0
      6    3  1    0
      8    3  1    0
      4    4  1    6
      6    4  1    2
      8    4  1    0
      4    5  1    2
      6    5  1    1
      8    5  1    2

The resulting heatmap would have (for example) blue for values of am==0 and red for am==1. Each tile would be divided (along a diagonal?) according to the proportion of cars of that type that are automatic (am==0) or manual (am==1). The shades of blue and red would be proportionate to the count, just as the gradient already reflects.

For example:

  • the top left tile (4,5) would be completely light red because all of the 4-cyl, 5-gear cars (count = 2) are manual

  • the middle left tile (4,4) would be 1/4 blue and 3/4 red because 25% of the 4-gear, 4-cyl cars are automatic (count = 2) and 75% are manual (count = 6)

  • the bottom left tile (4,3) would be completely lightest blue because all of the 4-cyl, 3-gear cars (count = 1) are automatic

Hilary
  • 151
  • 2
  • 6
  • According to `?mtcars` `am` is defined as _Transmission (0 = automatic, 1 = manual)_. In your question you have defined _automatic (`am==1`) or manual (`am==0`)_ and _blue for values of `am==1` and red for `am==0`_ which is just the other way around. Please, can you [edit] your Q and clarify? - Thank you. – Uwe Jan 22 '17 at 17:24
  • fixed. thanks for offering the clarification and the solution! – Hilary Jan 24 '17 at 18:32

2 Answers2

5

This is a second and hopfully complete attempt to answer the question by manipulating the frequency counts so that they become negative for am==1. The difference to the first attempt is that geom_col(position = "fill") is used instead of geom_tile() for plotting.

Note: I didn't edited the first answer because the OP has already commented on it and I might delete that first and incomplete answer, eventually.

Preparing the data

freq_am <-data.frame(xtabs(~cyl + gear + am, mtcars))
freq_am$Freq_am <- freq_am$Freq * (-1)^as.integer(as.character(freq_am$am))

This creates a new column Freq_am where Freq counts are multiplied with -1 if am == 1 (manual). Using exponentiation by a logical value is a trick to avoid ifelse.

Plotting

There are two possibilities to achieve the desired heatmap-like appearance.

Variant 1

p <- ggplot(freq_am, (aes(x = cyl, y = Freq, fill = Freq_am))) + 
  geom_col(position = "fill", width = 1) + 
  scale_fill_gradient2() +
  facet_grid(gear ~ ., as.table = FALSE, switch = "y") + 
  scale_y_continuous(expand = c(0, 0)) + 
  scale_x_discrete(expand = c(0, 0))
p

This creates a stacked bar chart of Freq vs cyl using geom_col() where the bars are stretched vertically (position = "fill") and horizontally (width = 1) to fill the plotting area. In addition, the expand = c(0, 0) parameter to the scale functions tells ggplot to not expand the axes as usual. Note that the x-axis is discrete as xtabs() has coerced cyl to factor.

facet_grid() is used to simulate an y-axis with the grid values in increasing order (as.table = FALSE). switch = "y" moves the panel strips to the left side.

scale_fill_gradient2() uses a convenient diverging colour scheme by default so that the count of cars with automatic transmission appears in blue and the count of cars with manual transmission in red.

enter image description here

Now, we need to remove all decorations and spaces which aren't needed for a heatmap. Finally, the y-axis label is renamed:

p + theme(panel.grid = element_blank()
          , axis.ticks = element_blank()
          , axis.text.y = element_blank()
          , strip.background = element_blank()
          , panel.spacing.y = unit(0, "pt")
) + 
  ylab("gear")

enter image description here

The downside of this approach is the lack of borders between tiles. So, it is difficult to distinguish the share of counts if adjacent tiles have the same colour as, e.g., the 6-cyl, 3-gear and 4-gear, resp., tiles.

Variant 2

This variant adds borders between the tiles. The width of the borders can be flexibly adjusted:

p <- ggplot(freq_am, (aes(x = 1, y = Freq, fill = Freq_am))) + 
  geom_col(position = "fill") + 
  scale_fill_gradient2() +
  facet_grid(gear ~ cyl, as.table = FALSE, switch = "both") +
  scale_y_continuous(expand = c(0, 0)) + 
  scale_x_continuous(expand = c(0, 0))
p

Here, we use facet_grid() for both directions. For each panel, Freq is plotted vs a dummy variable 1 using geom_col() as above. As the dummy variable 1 is numeric we don't need the width parameter to geom_col(). Both axes are continuous now.

enter image description here

Again, we need to remove some of the decorations and to rename the labels on the x and y-axes:

p + theme(panel.grid = element_blank()
        , axis.ticks = element_blank()
        , axis.text = element_blank()
        , strip.background = element_blank()
        # , panel.spacing = unit(0, "pt")
  ) + 
  xlab("cyl") + ylab("gear")

enter image description here

Now, we do have a heatmap with borders between the tiles. In order to remove the borders or adjust the width, you can uncomment the line with panel.spacing and change the value.

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
1

This is a first attempt to find an (incomplete) answer to the Q by manipulating the frequency counts so that they become negative for am==0.

Note that the question is not fully clear. ?mtcars defines am as

Transmission (0 = automatic, 1 = manual).

while the OP has defined

automatic (am==1) or manual (am==0)

which is just the other way around. In addition, the OP has requested the heatmap to show blue for values of am==1 and red for am==0.

Preparing the data

freq_am <-data.frame(xtabs(~cyl + gear + am, mtcars))
freq_am$Freq_am <- -freq_am$Freq * (-1)^as.integer(as.character(freq_am$am))
freq_am$gear_am <- factor(paste(as.character(freq_am$gear), as.character(freq_am$am), sep = "_"))

freq_am
#freq_am
#   cyl gear am Freq Freq_am gear_am
#1    4    3  0    1      -1     3_0
#2    6    3  0    2      -2     3_0
#3    8    3  0   12     -12     3_0
#4    4    4  0    2      -2     4_0
#5    6    4  0    2      -2     4_0
#6    8    4  0    0       0     4_0
#7    4    5  0    0       0     5_0
#8    6    5  0    0       0     5_0
#9    8    5  0    0       0     5_0
#10   4    3  1    0       0     3_1
#11   6    3  1    0       0     3_1
#12   8    3  1    0       0     3_1
#13   4    4  1    6       6     4_1
#14   6    4  1    2       2     4_1
#15   8    4  1    0       0     4_1
#16   4    5  1    2       2     5_1
#17   6    5  1    1       1     5_1
#18   8    5  1    2       2     5_1

Note that xtabs() has coerced am to factor:

str(freq_am$am)
# Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...

To convert am back to numeric we have to use as.integer(as.character(freq_am$am)). (You may convert the level numbers directly to the original numeric values by using (as.integer(am) - 1) but that's less save.)

gear_am will be used as new y-axis when plotting the heatmap.

Plotting

library(ggplot2)
ggplot(freq_am, aes(cyl, gear_am, fill = Freq_am)) +
  geom_tile() + 
  scale_fill_gradient2() + 
  theme_minimal() + 
  theme(panel.grid = element_blank())

scale_fill_gradient2() uses a convenient diverging colour scheme by default. The tiles for gear on the y-axis have now been split up into tiles with am==0 and am==1.

enter image description here

"Incomplete" answer

The OP has requested that the now split-up tiles should be completely filled even if there are zero counts. This could be achieved by further manipulating freq_am. However, I find the current chart communicates the result in a clear, unamibiguous way.

Uwe
  • 41,420
  • 11
  • 90
  • 134
  • This is a great start, but I find it difficult to interpret. The boxes that are now discrete (e.g. `(4,5_1)` vs.`(4,5_0)`) are dichotomous values of one characteristic (`am`). This graphic doesn't guide the viewer to compare those values. I suggested splitting tiles into proportions to make that contrast more clear, but maybe this requires a different graphical approach altogether. – Hilary Jan 24 '17 at 20:20