I'm trying to make a clustered heatmap, as described here Cluster data in heat map in R ggplot and am running into a perplexing bug.
I can make an un-clustered distance heatmap as follows:
library(vegan)
library(tidyverse)
data(varespec)
library(reshape2)
library(viridis)
# Calculate a distance matrix
vare.dist <- vegdist(varespec)
# Cluster the distance matrix.
vare.hc <- hclust(as.dist(vare.dist))
# Process and melt the distance matrix
vare.dist.long <- vare.dist %>% as.matrix %>% melt %>%
mutate(Var1 = as.character(Var1), Var2 = as.character(Var2))
# Plot the heatmap
vare.dist.long %>% #as.matrix %>% .[vare.hc$order, vare.hc$order] %>% melt %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5
))
To cluster the heatmap, I need to convert vare.dist.long$Var1
and vare.dist.long$Var2
into properly ordered factors. I would think that I could do that as
# Step 1: works without complaint
vare.dist.long1 <- vare.dist.long %>% mutate(Var1 = factor(Var1, levels = Var1[vare.hc$order]))
# Step 2: throws error
vare.dist.long2 <- vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order]))
And then replacing vare.dist.long
with vare.dist.long3
in the plotting function.
Strangely, while ordering Var1
(as in the #Step 1
line) seems to work without complaint, when I try to do exactly the same thing to Var2
(as in the #Step 2
line) I get the following error:
Error in mutate_impl(.data, dots): Evaluation error: factor level [2] is duplicated. Traceback: 1. vare.dist.long %>% mutate(Var2 = factor(Var2, levels = Var2[vare.hc$order])) 2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 3. eval(quote(`_fseq`(`_lhs`)), env, env) 4. eval(quote(`_fseq`(`_lhs`)), env, env) 5. `_fseq`(`_lhs`) 6. freduce(value, `_function_list`) 7. withVisible(function_list[[k]](value)) 8. function_list[[k]](value) 9. mutate(., Var2 = factor(Var2, levels = Var2[vare.hc$order])) 10. mutate.data.frame(., Var2 = factor(Var2, levels = Var2[vare.hc$order])) 11. as.data.frame(mutate(tbl_df(.data), ...)) 12. mutate(tbl_df(.data), ...) 13. mutate.tbl_df(tbl_df(.data), ...) 14. mutate_impl(.data, dots)
What am I missing here? Why can't I mutate Var2
, which as far as I can tell is pretty much the same as Var1
but in a different order?