1

Dataframe as an example:

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 MyAim = c("A","D","",""))

      x    y MyAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A      
4 C,E,G <NA>      

I want to get common values between x and y columns in a new one. Thanks in advance.

youraz
  • 463
  • 4
  • 14

3 Answers3

2

We can use mapply :

df$Z <- mapply(function(x, y) {
            temp <- intersect(x, y)
            if(length(temp)) temp else ""
        }, strsplit(df$x, ","), df$y)

df
#      x    y Z
#1 A,B,C    A A
#2   A,D    D D
#3 B,C,E    A  
#4 C,E,G <NA>  

If there are multiple values in y, we can split the string in y and return a comma-separated value.

df$Z <- mapply(function(x, y) {
     temp <- intersect(x, y)
     if(length(temp)) toString(temp) else ""
     }, strsplit(df$x, ","), strsplit(df$y, ","))

data

df <- data.frame(x = c("A,B,C","A,D","B,C,E","C,E,G"),
                 y = c("A","D","A",NA),
                 stringsAsFactors = FALSE)
youraz
  • 463
  • 4
  • 14
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

strsplit could easily be used in an apply which coerces to character, try

df <- transform(df, MyAim=apply(df, 1, function(x) {
  s <- el(strsplit(x[1], ","))
  s[match(x[2], s)]
  }))
df
# x    y MyAim
# 1 A,B,C    A     A
# 2   A,D    D     D
# 3 B,C,E    A  <NA>
# 4 C,E,G <NA>  <NA>
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

If x is character, then the following is one of many ways to do this:

intersect(unlist(strsplit(df$x, split=",")), df$y)

If x is not a character, then strsplit will crash, so the following is required:

intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)

And to add this to the data frame,

myAim <- intersect(unlist(strsplit(as.character(df$x), split=",")), df$y)
df$myAim <- c(myAim, rep(NA, nrow(df)-length(myAim)))
df
      x    y myAim
1 A,B,C    A     A
2   A,D    D     D
3 B,C,E    A  <NA>
4 C,E,G <NA>  <NA>

Note: If y contained values like in x, then the length of myAim may be higher than the number of rows of the data frame. In that situation, adding the result to the data frame may not seem appropriate.

Edward
  • 10,360
  • 2
  • 11
  • 26