2

I have a matrix, like the below one

       ID_1 ID_2 ID_3 ID_4 ID_8 ID_5 ID_7 ID_100
ID_1      0    1    1    1  Inf    2    2    Inf
ID_2      1    0    2    1  Inf    1    2    Inf
ID_3      1    2    0    2  Inf    3    1    Inf
ID_4      1    1    2    0  Inf    2    1    Inf
ID_8    Inf  Inf  Inf  Inf    0  Inf  Inf      1
ID_5      2    1    3    2  Inf    0    3    Inf
ID_7      2    2    1    1  Inf    3    0    Inf
ID_100  Inf  Inf  Inf  Inf    1  Inf  Inf      0

I used as.data.frame.table and filter because I want only those values that have 3. The output look likes

  nodeA nodeB score
1  ID_5  ID_3     3
2  ID_3  ID_5     3
3  ID_7  ID_5     3
4  ID_5  ID_7     3

The code I wrote

mat_pi_lon <- as.data.frame.table(mat, responseName = "score") %>%
  filter(score ==3) %>%
  rename(nodeA = Var1, nodeB = Var2)

But, my actual expected output is like the below one. Because, ID_3 ID_5 3 and ID_5 ID_3 3 are same (in terms of concept). So, I want only ID_3 ID_5 3 not ID_5 ID_3 3.

  nodeA nodeB score
1  ID_3  ID_5     3
2  ID_5  ID_7     3

Is it possible to reduce the output?

Reproducible Data

structure(c(0, 1, 1, 1, Inf, 2, 2, Inf, 1, 0, 2, 1, Inf, 1, 2, 
Inf, 1, 2, 0, 2, Inf, 3, 1, Inf, 1, 1, 2, 0, Inf, 2, 1, Inf, 
Inf, Inf, Inf, Inf, 0, Inf, Inf, 1, 2, 1, 3, 2, Inf, 0, 3, Inf, 
2, 2, 1, 1, Inf, 3, 0, Inf, Inf, Inf, Inf, Inf, 1, Inf, Inf, 
0), .Dim = c(8L, 8L), .Dimnames = list(c("ID_1", "ID_2", "ID_3", 
"ID_4", "ID_8", "ID_5", "ID_7", "ID_100"), c("ID_1", "ID_2", 
"ID_3", "ID_4", "ID_8", "ID_5", "ID_7", "ID_100")))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0Knowledge
  • 747
  • 3
  • 14

3 Answers3

2

You may replace the upper.tri with NA first and thereby save a few steps.

library(dplyr)
mat %>%
  replace(., upper.tri(.), NA) %>%
  as.data.frame.table(responseName="score") %>%
  filter(score %in% 3)
#   Var1 Var2 score
# 1 ID_5 ID_3     3
# 2 ID_7 ID_5     3

Using base R:

mat[upper.tri(mat)] <- NA
d <- as.data.frame.table(mat, responseName="score")
d[d$score %in% 3, ]
#    Var1 Var2 score
# 22 ID_5 ID_3     3
# 47 ID_7 ID_5     3
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

You can compare character strings with <= but not factors. So, coerce Var1 and Var2 to character before filtering all Var1 > Var2 out.

library(dplyr)

mat %>% as.data.frame.table(responseName = "score") %>%
  mutate(Var1 = as.character(Var1),
         Var2 = as.character(Var2)) %>%
  filter(score == 3 & Var1 <= Var2) %>%
  rename(nodeA = Var1, nodeB = Var2)
#  nodeA nodeB score
#1  ID_3  ID_5     3
#2  ID_5  ID_7     3
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

We can use pmin/pmax to rearrange the elements by row and use duplicated to remove the duplicate elements

library(dplyr)
mat_pi_lon %>% 
   filter(!duplicated(cbind(pmin(as.character(nodeA), 
            as.character(nodeB)),
        pmax(as.character(nodeA), as.character(nodeB)))))
#  nodeA nodeB score
#1  ID_5  ID_3     3
#2  ID_7  ID_5     3

Or use base R in a one-liner

subset(mat_pi_lon, !duplicated(t(apply(mat_pi_lon, 1, sort))))
#   nodeA nodeB score
#1  ID_5  ID_3     3
#3  ID_7  ID_5     3
akrun
  • 874,273
  • 37
  • 540
  • 662