0

this seems like such a basic question to me, that I'm almost sure it must be covered somewhere around here, but I've been searching for quite some time now and just can't seem to find the right answer.

My data looks like this:

data <- data.frame(col1 = c("A","A","B","B"), col2 = c("B","C","A","C"), value = c(1,2,3,4))

    col1 col2 value
1    A    B    1
2    A    C    2
3    B    A    3
4    B    C    4

I want to merge col1 and col2 into a variable that indicates the unique dyads in a single vector. It should not matter, whether "A" and "B" are a value of col1 or col2. Each row that contains "A" and "B" combined in col1 and col2 should get the same value of the new variable. I tried to use tidyr for this.

unite(data, col1, col2, col="dyad", sep="_")

returns

  dyad value
1  A_B    1
2  A_C    2
3  B_A    3
4  B_C    4

Basically, I need dyad to contain the same value for A_B and B_A, because these pairs are equivalent for me. This is what it should look like, for example:

  dyad value
1  A_B    1
2  A_C    2
3  A_B    3
4  B_C    4

Is there an easy way to do this? Thanks a lot!

d4ynn
  • 27
  • 4

2 Answers2

1

There may be more elegant solutions, but perhaps this helps:

data <- data.frame(col1 = c("A","A","B","B"), col2 = c("B","C","A","C"), value = c(1,2,3,4),
               stringsAsFactors = FALSE)     
data$dyad <- apply(data[,c("col1","col2")], 1, FUN= function(x) paste(sort(x), collapse="_"))

So the apply function ensures that the function is applied to each row of the data frame. The function first sorts the input and then pastes them together.

EDIT: I copied stringsAsFactors = FALSE from the other answer, as I used it as well but forgot to include it in my post :)

user3640617
  • 1,546
  • 13
  • 21
0

A solution using dplyr. Notice that I added stringsAsFactors = FALSE when creating the data frame because it is better to work on character columns in this case.

data <- data.frame(col1 = c("A","A","B","B"), col2 = c("B","C","A","C"), value = c(1,2,3,4),
                   stringsAsFactors = FALSE) 

library(dplyr)

data2 <- data %>%
  rowwise() %>%
  mutate(dyad = paste(sort(c(col1, col2)), collapse = "_")) %>%
  select(dyad, value) %>%
  ungroup()
data2
# # A tibble: 4 x 2
#    dyad value
#   <chr> <dbl>
# 1   A_B     1
# 2   A_C     2
# 3   A_B     3
# 4   B_C     4
www
  • 38,575
  • 12
  • 48
  • 84