1

The dataframe I am working on is coded in dyadic format where each observation (i.e., row) contains a source node (from) and a target node (to) along with other some dyadic covariates (such as dyadic correlation, corr).

For simplicity sake, I want to treat each dyad as un-ordered and generate a unique identifier for each dyad like the one (i.e., df1) elow:

# original data
df <- data.frame(
from = c("A", "A", "A", "B", "C", "A", "D", "E", "F", "B"),
to = c("B", "C", "D", "C", "B", "B", "A", "A", "A", "A"),
corr = c(0.5, 0.7, 0.2, 0.15, 0.15, 0.5, 0.2, 0.45, 0.54, 0.5))

   from to corr
1     A  B 0.50
2     A  C 0.70
3     A  D 0.20
4     B  C 0.15
5     C  B 0.15
6     A  B 0.50
7     D  A 0.20
8     E  A 0.45
9     F  A 0.54
10    B  A 0.50

# desired format
df1 <- data.frame(
from = c("A", "A", "A", "B", "C", "A", "D", "E", "F", "B"),
to = c("B", "C", "D", "C", "B", "B", "A", "A", "A", "A"),
corr = c(0.5, 0.7, 0.2, 0.15, 0.15, 0.5, 0.2, 0.45, 0.54, 0.5),
dyad = c(1, 2, 3, 4, 4, 1, 3, 5, 6, 1))

   from to corr dyad
1     A  B 0.50    1
2     A  C 0.70    2
3     A  D 0.20    3
4     B  C 0.15    4
5     C  B 0.15    4
6     A  B 0.50    1
7     D  A 0.20    3
8     E  A 0.45    5
9     F  A 0.54    6
10    B  A 0.50    1

where dyad A-B/B-A, A-D/D-A are treated as identical pairs and are assigned with the same dyad identifiers. While it's easy to extract a list of un-ordered pairs from the original data, it's hard to map them onto the original dataframe to generate un-ordered dyad identifiers. Could anyone offer some insights on this?

Chris T.
  • 1,699
  • 7
  • 23
  • 45

2 Answers2

3

One dplyr option could be:

df %>%
 mutate(dyad = group_indices(., paste0(pmax(from, to), pmin(from, to))))

   from to corr dyad
1     A  B 0.50    1
2     A  C 0.70    2
3     A  D 0.20    4
4     B  C 0.15    3
5     C  B 0.15    3
6     A  B 0.50    1
7     D  A 0.20    4
8     E  A 0.45    5
9     F  A 0.54    6
10    B  A 0.50    1

Or:

df %>%
 mutate(dyad = dense_rank(paste0(pmax(from, to), pmin(from, to))))

However, if you need to assign the identifiers in a specific order (meaning that the identifiers hold some information on their own), then the solution from @Ronak Shah could be better for you.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thank you so much for clarifying this, it really helps. – Chris T. Aug 12 '19 at 10:39
  • Here is a simple function that builds on the that uses dense_rank. ```create_dyad_id <- function(.data, id_a, id_b) { id_a <- ensym(id_a) id_b <- ensym(id_b) .data %>% mutate(dyad_id = dense_rank(paste0( pmax(!!id_a,!!id_b), pmin(!!id_a,!!id_b) ))) } ``` – greg_s Jan 27 '23 at 19:59
1

One way using apply could be to sort and paste the value in two column, convert them to factor and then integer to get a unique number for each combination.

df$temp <- apply(df[1:2], 1, function(x) paste(sort(x), collapse = "_"))
df$dyad <- as.integer(factor(df$temp, levels = unique(df$temp)))
df$temp <- NULL
df

#   from to corr dyad
#1     A  B 0.50    1
#2     A  C 0.70    2
#3     A  D 0.20    3
#4     B  C 0.15    4
#5     C  B 0.15    4
#6     A  B 0.50    1
#7     D  A 0.20    3
#8     E  A 0.45    5
#9     F  A 0.54    6
#10    B  A 0.50    1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213