Other approaches to handle extreme values / distribution when using scale_fill_gradient?

Question

For example, I would like to map the colour to z, with 0 maps to "white".

> a <- data.frame(x=1:10, y=1, z=c(rnorm(8),-12,12))
> a
    x y           z
1   1 1  -0.4603911
2   2 1  -0.4868471
3   3 1   0.2180346
4   4 1  -0.8807652
5   5 1   1.7379462
6   6 1  -0.1334904
7   7 1  -0.3675578
8   8 1   0.9225425
9   9 1 -12.0000000
10 10 1  12.0000000

ggplot(a,aes(x=x,y=y,fill=z)) + geom_bar(stat="identity") + 
  scale_fill_gradient2(high="green", mid="white", low="red")

As you can see the colour is not really useful indicator, instead of conveying a general idea of how the values are distributed, now the colour only tells which values are extreme, leaving the majority values indistinguishable by untrained eyes.

There is a method Non-linear color distribution over the range of values in a geom_raster but it seems a bit complicated and I can only vaguely understand how it works.

enter image description here

I then thought maybe order is a good rescale way, hence:

ggplot(a,aes(x=x,y=y,fill=ecdf(z)(z))) + geom_bar(stat="identity") +
scale_fill_gradient2(high="green", mid="white", low="red", midpoint=ecdf(a$z)(0))

It worked to some extent (here I used ecdf instead of order to find what value 0 is rescaled. However, the drawback is, I would like to keep the labels of the legend as the unscaled values, instead of the rescaled ones. So something like labels=function(x) quantile(a$z, x), which I cannot make it work. Also, I find it stupid to repeatedly using ecdf and quantile to rescale forward and backward.

Is there any better or simpler approach in these cases, e.g. robust (not need to be optimal or very accurate) enough to fill reasonable colours for all kinds of distributions of mapped values.

enter image description here

score 3 · Answer 1 · edited Sep 10 '19 at 03:26

There is not an easy way that I know of, but you can have full control of the mapping with scale_fill_gradientn. The key is to map colors to values in the 0-1 range where 0 is your min value, and 1 is your max value. Here is an option:

library(ggplot2)
a <- data.frame(x=1:10, y=1, z=c(rnorm(8),-12,12))
get_col <- colorRamp(c("red", "white", "green"))  # make fun to interpolate colors
quantiles <- (0:6) / 6                            # how many quantiles we want to map 
quantile.vals <- quantile(a$z, quantiles, names=F)# the values for each quantile
colours <- rgb(get_col(quantiles), max=255)       # 7 evenly interpolated colors 
val.remap <- (quantile.vals - min(a$z)) / 
  diff(range(a$z))                                # The values corresponding to the quantiles

ggplot(a, aes(x=x,y=y,fill=z)) + 
  geom_bar(stat="identity") +
  scale_fill_gradientn(
    colours=colours,
    values=val.remap,
    breaks=quantile.vals,# Necessary to get legend values spread appropriately
    guide="legend")      # Necessary to get legend values spread appropriately

Here we chose to assign evenly interpolated colors to values based on the distribution of values. So, if a value range has corresponds to a large part of a distribution even though it actual spans a relatively small portion of the min-max range, it will get more color allocated.

If you want to assign a specific color to zero you can do so by editing the vectors corresponding to the colours, values, and breaks arguments. This ranges from trivial if you have the same number of values above and below zero, to annoying if not.

Version w/ 0 set to white:

library(ggplot2)
a <- data.frame(x=1:10, y=1, z=c(rnorm(8), -12, 12))
splits <- 7     # should be odd number
mid.point <- 0
pos.vals <- a$z[a$z > mid.point]
neg.vals <- a$z[a$z < mid.point]
pos.quants <- quantile(c(mid.point, pos.vals), 0:((splits - 1) / 2) / ((splits - 1) / 2), names=F)
neg.quants <- quantile(c(mid.point, neg.vals), 0:((splits - 1) / 2) / ((splits - 1) / 2), names=F)
quants <- c(neg.quants, pos.quants[-1])  # drop of the mid-point from pos.quants since otherwise double counted

get_col <- colorRamp(c("red", "white", "green"))  # make fun to interpolate colors
colours <- rgb(get_col(0:(splits - 1)/(splits - 1)), max=255)       # 7 evenly interpolated colors 
val.remap <- (quants - min(quants)) / 
  diff(range(quants))                                # The values corresponding to the quantiles

ggplot(a, aes(x=x,y=y,fill=z)) + 
  geom_bar(stat="identity") +
  scale_fill_gradientn(
    colours=colours,
    values=val.remap,
    breaks=quants,
    guide="legend")

Other approaches to handle extreme values / distribution when using scale_fill_gradient?

1 Answers1