How to get common values between two columns in R?

Question

Dataframe as an example:

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 MyAim = c("A","D","",""))

      x    y MyAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A      
4 C,E,G <NA>

I want to get common values between x and y columns in a new one. Thanks in advance.

score 2 · Accepted Answer · edited Mar 13 '20 at 07:31

2

We can use mapply :

df$Z <- mapply(function(x, y) {
            temp <- intersect(x, y)
            if(length(temp)) temp else ""
        }, strsplit(df$x, ","), df$y)

df
#      x    y Z
#1 A,B,C    A A
#2   A,D    D D
#3 B,C,E    A  
#4 C,E,G <NA>

If there are multiple values in y, we can split the string in y and return a comma-separated value.

df$Z <- mapply(function(x, y) {
     temp <- intersect(x, y)
     if(length(temp)) toString(temp) else ""
     }, strsplit(df$x, ","), strsplit(df$y, ","))

data

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 stringsAsFactors = FALSE)

edited Mar 13 '20 at 07:31

youraz

463
4
14

answered Mar 13 '20 at 06:50

Ronak Shah

377,200
20
156
213

What should we do if Z column has multiple common values? or e.g. df$y[1] is A,B @Ronak – youraz Mar 13 '20 at 07:22
1

@nerdakgul spit `y` as well and get comma-separated value from the function. Updated the answer. – Ronak Shah Mar 13 '20 at 07:24

score 1 · Answer 2 · answered Mar 13 '20 at 06:39

1

strsplit could easily be used in an apply which coerces to character, try

df <- transform(df, MyAim=apply(df, 1, function(x) {
  s <- el(strsplit(x[1], ","))
  s[match(x[2], s)]
  }))
df
# x    y MyAim
# 1 A,B,C    A     A
# 2   A,D    D     D
# 3 B,C,E    A  <NA>
# 4 C,E,G <NA>  <NA>

answered Mar 13 '20 at 06:39

jay.sf

60,139
8
53
110

1

`?el` - new function of the day for me. – zx8754 Mar 13 '20 at 07:53

Edward · Answer 3 · 2020-03-13T07:02:30.610

If x is character, then the following is one of many ways to do this:

intersect(unlist(strsplit(df$x, split=",")), df$y)

If x is not a character, then strsplit will crash, so the following is required:

intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)

And to add this to the data frame,

myAim <- intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)
df$myAim <- c(myAim, rep(NA, nrow(df)-length(myAim)))
df
      x    y myAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A  <NA>
4 C,E,G <NA>  <NA>

Note: If y contained values like in x, then the length of myAim may be higher than the number of rows of the data frame. In that situation, adding the result to the data frame may not seem appropriate.

How to get common values between two columns in R?

3 Answers3