Plot a re-leveled pairwise distance matrix in ggplot2

Question

Loading libraries and creating a reproducible example

#Load libraries
set.seed(123)
library(tidyr)
library(ggplot2)

#Creating a fake pairwise matrix
locs <- 5
tmp <- matrix(runif(n = locs*locs),nrow = locs,ncol = locs)
tmp[upper.tri(tmp,diag = T)] <- NA
colnames(tmp) <- LETTERS[1:locs]
rownames(tmp) <-  LETTERS[1:locs]
tmp

#Converting into a data frame
tmp1 <- as.data.frame(cbind(rownames(tmp),as.data.frame(tmp)))
names(tmp1)[1] <- "locA"
rownames(tmp1) <- NULL
head(tmp1)

#Changing it to long form and getting rid of NAs
tmp1 <- gather(tmp1, key = "locB",value = "value",-locA)
tmp1 <- tmp1[!is.na(tmp1$value),]
tmp1

#Making a tiled plot based on default levels
ggplot(tmp1, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

But for reasons that make more biological sense, I want to change the order in which those comparisons are ordered

#biological order
my.order <- c("A","C","D","B","E")
my.order

#re-leveling
tmp1$locA <- factor(tmp1$locA, levels = my.order,ordered = T)
tmp1$locB <- factor(tmp1$locB, levels = my.order,ordered = T)
tmp1

#the releveled plot
ggplot(tmp1, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

I am trying to find a way to get the "B-C" & "B-D" comparisons to be represented in the lower diagonal.

I tried to find a solution with a full matrix and lower.tri(), but have failed so far

#here is the full matrix
x <- tmp
x[is.na(x)] <- 0
y <- t(tmp)
y[is.na(y)] <- 0
full.matrix <- x+y
full.matrix

#the function lower.tri might be useful in this context
lower.tri(full.matrix)

I believe you need to switch `locA` and `locB` for those two pairs in your underlying data. — Mike H., Feb 07 '18 at 14:36
Yes, you're right. For this simple example I could hard code the fix in there, but for my actual dataset, there are 26 loc's, and therefore the hardcode fix becomes much more cumbersome. I am looking for a generic solution that would resolve this issue in any pairwise distance matrix. — Sardimus, Feb 07 '18 at 15:35
Can you change the underlying data more upstream? In your example, when you assign `colnames(tmp)...` and `rownames(tmp)...` if you assigned the proper order(`c("A","C","D","B","E")`), it should work out after you reassign the order. — Mike H., Feb 07 '18 at 15:39
For the fake example yes it would work, but R functions that calculate pairwise distance matrices usually provide a matrix where the order of columns and rows are done so in an alphanumeric fashion, so changing colnames() and rownames() upstream of the creation of tmp would resolve the issue here. tmp, in its final form, in this example is what my dataset looks like. This type of formatted data is common in community ecology and population genetics. — Sardimus, Feb 07 '18 at 15:54

score 1 · Answer 1 · answered Feb 07 '18 at 16:18

Starting from after tmp and full.matrix are created, if you run:

reordered_mat <- full.matrix[match(my.order, rownames(full.matrix)), 
                             match(my.order, colnames(full.matrix))]

lt_reordered_mat <- replace(reordered_mat, !lower.tri(reordered_mat), NA)

tmp1 <- as.data.frame(cbind(rownames(lt_reordered_mat),as.data.frame(lt_reordered_mat)))

And then rerun all your tmp1 creation code and reordering, then you should get you desired result:

Full reproducible code:

#Load libraries
set.seed(123)
library(tidyr)
library(ggplot2)

#Creating a fake pairwise matrix
locs <- 5
tmp <- matrix(runif(n = locs*locs),nrow = locs,ncol = locs)
tmp[upper.tri(tmp,diag = T)] <- NA
colnames(tmp) <- LETTERS[1:locs]
rownames(tmp) <-  LETTERS[1:locs]

x <- tmp
x[is.na(x)] <- 0
y <- t(tmp)
y[is.na(y)] <- 0
full.matrix <- x+y


my.order <- c("A","C","D","B","E")

reordered_mat <- full.matrix[match(my.order, rownames(full.matrix)), 
                             match(my.order, colnames(full.matrix))]
lt_reordered_mat <- replace(reordered_mat, !lower.tri(reordered_mat), NA)
tmp1 <- as.data.frame(cbind(rownames(lt_reordered_mat),as.data.frame(lt_reordered_mat)))
names(tmp1)[1] <- "locA"
rownames(tmp1) <- NULL

#Changing it to long form and getting rid of NAs
tmp1 <- gather(tmp1, key = "locB",value = "value",-locA)
tmp1 <- tmp1[!is.na(tmp1$value),]


#re-leveling
tmp1$locA <- factor(tmp1$locA, levels = my.order,ordered = T)
tmp1$locB <- factor(tmp1$locB, levels = my.order,ordered = T)

#the releveled plot
ggplot(tmp1, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

Thanks, that's what I was looking for! – Sardimus Feb 07 '18 at 16:30 — Sardimus, Feb 07 '18 at 16:30

Sardimus · Answer 2 · 2018-02-08T13:14:05.933

As Mike H. was providing his answer, I created a slightly different solution. I think his answer is better because it's more succinct and doesn't use a for loop.

#Load libraries
set.seed(123)
library(tidyr)
library(ggplot2)

#Creating a fake pairwise matrix
locs <- 5
tmp <- matrix(runif(n = locs*locs),nrow = locs,ncol = locs)
tmp[upper.tri(tmp,diag = T)] <- NA
colnames(tmp) <- LETTERS[1:locs]
rownames(tmp) <-  LETTERS[1:locs]
tmp

#Converting into a data frame
tmp1 <- as.data.frame(cbind(rownames(tmp),as.data.frame(tmp)))
names(tmp1)[1] <- "locA"
rownames(tmp1) <- NULL
head(tmp1)

#Changing it to long form and getting rid of NAs
tmp1 <- gather(tmp1, key = "locB",value = "value",-locA)
tmp1 <- tmp1[!is.na(tmp1$value),]
tmp1

#Making a tiled plot based on default levels
ggplot(tmp1, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

#biological order
my.order <- c("A","C","D","B","E")
my.order

#re-leveling
tmp1$locA <- factor(tmp1$locA, levels = my.order,ordered = T)
tmp1$locB <- factor(tmp1$locB, levels = my.order,ordered = T)
tmp1

#the releveled plot
ggplot(tmp1, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

#reordering tmp by my.order and replacing NAs with zero
x <- tmp
x<- x[my.order,my.order]
x[is.na(x)] <- 0
x

#identifying which values switch from the lower matrix to the upper matrix
y <- x
y[y !=0] <- 1

#figuring out which side of the matrix that needs to be switched to switch locA and locB
if(sum(y[lower.tri(y)]) > sum(y[upper.tri(y)])){ y[lower.tri(y)] <- 0 }
if(sum(y[lower.tri(y)]) == sum(y[upper.tri(y)])){ y[lower.tri(y)] <- 0 }
if(sum(y[lower.tri(y)]) < sum(y[upper.tri(y)])){ y[upper.tri(y)] <- 0 }

#Converting t into a long form data frame
fm <- as.data.frame(cbind(rownames(y),as.data.frame(y)))
names(fm)[1] <- "locA"
rownames(fm) <- NULL
fm <- gather(fm, key = "locB",value = "value",-locA)

#identifying which need to be switched and created an identifer to merge with
fm$action <- ifelse(fm$value == 1,"switch","keep")
fm$both <- paste0(fm$locA,fm$locB)
fm

#creating the same identifer in tmp1
tmp1$both <- paste0(tmp1$locA,tmp1$locB)
head(tmp1)

#merging the fm and tmp1 together
tmp2 <- merge(x = fm[,4:5],y = tmp1,by = "both")
tmp2

#using a for loop to make the necessary switches
i <- NULL
for(i in 1:nrow(tmp2)){
  if(tmp2$action[i] == "switch"){
    A <- as.character(tmp2$locA[i])
    B <- as.character(tmp2$locB[i])
    tmp2$locA[i] <- B
    tmp2$locB[i] <- A
  }
}
tmp2

#re-leveling to my order
tmp2$locA <- factor(tmp2$locA, levels = my.order,ordered = T)
tmp2$locB <- factor(tmp2$locB, levels = my.order,ordered = T)
tmp2

#now the graphic
ggplot(tmp2, aes(x = locA, y=locB, fill=value, label=round(value,3)))+
  geom_tile(color="black")+
  geom_text(size=5,color="white")

Plot a re-leveled pairwise distance matrix in ggplot2

2 Answers2