0

I'm trying to make a clustered heatmap, as described here Cluster data in heat map in R ggplot and am running into a perplexing bug.

I can make an un-clustered distance heatmap as follows:

library(vegan)
library(tidyverse)
data(varespec)
library(reshape2)
library(viridis)

# Calculate a distance matrix
vare.dist <- vegdist(varespec)

# Cluster the distance matrix.
vare.hc <- hclust(as.dist(vare.dist))

# Process and melt the distance matrix
vare.dist.long <- vare.dist %>% as.matrix %>% melt %>%
mutate(Var1 = as.character(Var1), Var2 = as.character(Var2))

# Plot the heatmap
vare.dist.long %>% #as.matrix %>% .[vare.hc$order, vare.hc$order] %>% melt %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5
                                ))

unclustered heatmap

To cluster the heatmap, I need to convert vare.dist.long$Var1 and vare.dist.long$Var2 into properly ordered factors. I would think that I could do that as

# Step 1: works without complaint
vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = Var1[vare.hc$order]))
# Step 2: throws error
vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))

And then replacing vare.dist.long with vare.dist.long3 in the plotting function.

Strangely, while ordering Var1 (as in the #Step 1 line) seems to work without complaint, when I try to do exactly the same thing to Var2 (as in the #Step 2 line) I get the following error:

Error in mutate_impl(.data, dots): Evaluation error: factor level [2] is duplicated.
Traceback:

1. vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. mutate(., Var2 = factor(Var2, levels = Var2[vare.hc$order]))
10. mutate.data.frame(., Var2 = factor(Var2, levels = Var2[vare.hc$order]))
11. as.data.frame(mutate(tbl_df(.data), ...))
12. mutate(tbl_df(.data), ...)
13. mutate.tbl_df(tbl_df(.data), ...)
14. mutate_impl(.data, dots)

What am I missing here? Why can't I mutate Var2, which as far as I can tell is pretty much the same as Var1 but in a different order?

ohnoplus
  • 1,205
  • 1
  • 17
  • 29
  • 2
    Does `Var2[vare.hc$order]` have any repeated values? That would cause the error you're getting. If so, `unique(Var2[vare.hc$order])` should resolve it. – eipi10 Jan 30 '18 at 00:43

1 Answers1

1

The vector provided to the levels argument should not have any duplicates. If you type the following to your console, you will see that you provided the same level to all the numbers in Var2.

vare.dist.long$Var2[vare.hc$order]
# [1] "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18" "18"
# [19] "18" "18" "18" "18" "18" "18"

I think the following will work. unique(Var1) and unique(Var2) are to make sure there are no duplicates.

vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = unique(Var1)[vare.hc$order]))

vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = unique(Var2)[vare.hc$order]))
www
  • 38,575
  • 12
  • 48
  • 84
  • 1
    This absolutely works. I realize now that `unique(Var1)` and `unique(Var2)` and `vare.hc$labels` are all the same thing. I think the last of the three may be most intuitive (to me) since we are essentially saying that `vare.hc` tells you the order `vare.hc$order` the labels therein `vare.hc$lables` should be arranged. – ohnoplus Jan 30 '18 at 01:09