0

I have a matrix with pairwise comparisons, of which the upper triangle and diagonal was set to NA.

df <- data.frame(a=c(NA,1,2), b=c(NA,NA,3), c=c(NA,NA,NA))
row.names(df) <- names(df)

I want to transform the matrix to long format, for which the standard procedure is to use reshape2's melt, followed by na.omit, so my desired output would be:

Var1 Var2 Value
a     b   1
a     c   2
b     c   3

However, df$c is all NA and thus logical, and will be used as a non-measured variable by melt. The output of melt(df) is therefore not what i am looking for.

library(reshape2)
melt(df)

How can I prevent melt from using df$c as id variable?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
nouse
  • 3,315
  • 2
  • 29
  • 56

3 Answers3

4

The trick is to convert the rownames to column and then convert to long format. A way to do it in tidyverse would be,

library(tidyverse)

df %>% 
  rownames_to_column() %>% 
  gather(var, val, -1) %>% 
  filter(!is.na(val))


#  rowname var val
#1       b   a   1
#2       c   a   2
#3       c   b   3

As @Humpelstielzche mentions in comments, there is a na.rm argument in gather so we can omit the last filtering, i.e.

df %>% 
 rownames_to_column() %>% 
 gather(var, val, -1, na.rm = TRUE)
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

In base R, we can use row and col to get row names and column names respectively and then filter the NA values.

df1 <- data.frame(col = colnames(df)[col(df)], row = rownames(df)[row(df)], 
                  value = unlist(df), row.names = NULL)

df1[!is.na(df1$value), ]

#  col row value
#2   a   b     1
#3   a   c     2
#6   b   c     3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

While you have other answers already, this can be achieved with reshape2 and melt, if the appropriate function is called. In this case you don't want reshape2:::melt.data.frame but rather reshape2:::melt.matrix to be applied. So, try:

melt(as.matrix(df), na.rm=TRUE)
#  Var1 Var2 value
#2    b    a     1
#3    c    a     2
#6    c    b     3

If you then take a look at ?reshape2:::melt.data.frame you will see the statement:

This code is conceptually similar to ‘as.data.frame.table’

which means you could also use the somewhat more convoluted:

na.omit(as.data.frame.table(as.matrix(df), responseName="value"))
#  Var1 Var2 value
#2    b    a     1
#3    c    a     2
#6    c    b     3
thelatemail
  • 91,185
  • 12
  • 128
  • 188