3

I have a matrix with x rows (i.e. the number of draws) and y columns (the number of observations). They represent a distribution of y forecasts.

Now I would like to make sort of a 'heat map' of the draws. That is, I want to plot a 'confidence interval' (not really a confidence interval, but just all the values with shading in between), but as a 'heat map' (an example of a heat map ). That means, that if for instance a lot of draws for observation y=y* were around 1 but there was also a draw of 5 for that same observation, that then the area of the confidence interval around 1 is darker (but the whole are between 1 and 5 is still shaded).

To be totally clear: I like for instance the plot in the answer here, but then I would want the grey confidence interval to instead be colored as intensities (i.e. some areas are darker).

Could someone please tell me how I could achieve that?

Thanks in advance.

Edit: As per request: example data. Example of the first 20 values of the first column (i.e. y[1:20,1]):

[1]  0.032067416 -0.064797792  0.035022338  0.016347263  0.034373065 
0.024793101 -0.002514447  0.091411355 -0.064263536 -0.026808208 [11]  0.125831185 -0.039428744  0.017156454 -0.061574540 -0.074207109 -0.029171227  0.018906181  0.092816957  0.028899699 -0.004535961
Community
  • 1
  • 1
dreamer
  • 1,192
  • 5
  • 20
  • 42
  • Post some example data and maybe one of us will take a crack at it. – Mike Wise May 15 '15 at 18:13
  • @MikeWise I posted some example data in the OP now. Thanks :)! – dreamer May 15 '15 at 18:20
  • I'm having trouble understanding your dimensions... you give a 3d example (heatmap with x, y, and color dimension as z) and a 2d example (x and y, where the y happens to have a confidence interval). If you want to plot every value of a 2-d matrix, the heat map will work. If you want to summarize every column of a 2-d matrix into (say) a mean or median with confidence interval, then a heat map is inappropriate but the second plot you link to is easy. – Gregor Thomas May 15 '15 at 18:30
  • And I would recommend sharing at least two columns of data... would one column generate a single square in a heat map? or a single column of squares in a heat map? or a single point with confidence interval? – Gregor Thomas May 15 '15 at 18:31
  • @Gregor All columns are similar, which is why I posted one. To clarify: all columns actually have 10000 values. What I want is shading between the minimum and the maximum of those 10000 values, but since there are so many values, some areas of this shading should reflect the fact that more of the 10000 values are around that area. Hopefully that makes more sense to you. – dreamer May 15 '15 at 18:33

2 Answers2

3

So, the hard part of this is transforming your data into the right shape, which is why it's nice to share something that really looks like your data, not just a single column.

Let's say your data is this a matrix with 10,000 rows and 10 columns. I'll just use a uniform distribution so it will be a boring plot at the end

n = 10000
k = 10
mat = matrix(runif(n * k), nrow = n)

Next, we'll calculate quantiles for each column using apply, transpose, and make it a data frame:

dat = as.data.frame(t(apply(mat, MARGIN = 2, FUN = quantile, probs = seq(.1, 0.9, 0.1))))

Add an x variable (since we transposed, each x value corresponds to a column in the original data)

dat$x = 1:nrow(dat)

We now need to get it into a "long" form, grouped by the min and max values for a certain deviation group around the median, and of course get rid of the pesky percent signs introduced by quantile:

library(dplyr)
library(tidyr)
dat_long = gather(dat, "quantile", value = "y", -x) %>%
    mutate(quantile = as.numeric(gsub("%", "", quantile)),
           group = abs(50 - quantile))

dat_ribbon = dat_long %>% filter(quantile < 50) %>%
    mutate(ymin = y) %>%
    select(x, ymin, group) %>%
    left_join(
        dat_long %>% filter(quantile > 50) %>%
        mutate(ymax = y) %>%
        select(x, ymax, group)
    )

dat_median = filter(dat_long, quantile == 50)

And finally we can plot. We'll plot a transparent ribbon for each "group", that is 10%-90% interval, 20%-80% interval, ... 40%-60% interval, and then a single line at the median (50%). Using transparency, the middle will be darker as it has more ribbons overlapping on top of it. This doesn't go from the mininum to the maximum, but it will if you set the probs in the quantile call to go from 0 to 1 instead of .1 to .9.

library(ggplot2)
ggplot(dat_ribbon, aes(x = x)) +
    geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group), alpha = 0.2) +
    geom_line(aes(y = y), data = dat_median, color = "white")

enter image description here

Worth noting that this is not a conventional heatmap. A heatmap usually implies that you have 3 variables, x, y, and z (color), where there is a z-value for every x-y pair. Here you have two variables, x and y, with y depending on x.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • When I run `dat_long = gather(dat, "quantile", value = "y", -x) %>% mutate(quantile = as.numeric(gsub("%", "", quantile)), group = abs(50 - quantile))` I get the error `Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character'`. Do you know what could cause this? – dreamer May 15 '15 at 19:43
  • @dreamer Oops, I left in a line from an earlier attempt where I changed the column names. Delete the `names(dat) = ...` line and everything should work. (Edits already made in answer code.) – Gregor Thomas May 15 '15 at 19:53
  • Thank you, however, I am sorry to say, but I still get the same message (when I run the `dat_long...` statement). Do you have any other suggestions for what could be the issue? – dreamer May 15 '15 at 19:57
  • The deleted line modified `dat`, so you'll have to start from the beginning (or from where `dat` is first defined). If you still have problems after that, you might have some function masking going on. I just tested and my code runs fine in a fresh R session. – Gregor Thomas May 15 '15 at 21:52
  • Function masking caused the problem indeed. It works now :)! Thanks so much, you deserve more upvotes for this answer! As a final question: is it possible to experiment with different colors than shades of black easily (I see an argument for the mean line, but not for the rest)? If not, also fine, the plot looks really nice now! I really appreciate that you helped me! – dreamer May 16 '15 at 08:07
  • Sure, it's a "fill" color, set inside `geom_ribbon`. You could make it, e.g., `geom_ribbon(..., alpha = 0.2, fill = "dodgerblue4")`. Thanks for posting the end-product! – Gregor Thomas May 16 '15 at 18:29
1

That is not a lot to go on, but I would probably start with the hexbin or hexbinplot package. Several alternatives are presented in this SO post.

Formatting and manipulating a plot from the R package "hexbin"

Community
  • 1
  • 1
Mike Wise
  • 22,131
  • 8
  • 81
  • 104