2

I have a sample data frame like this enter image description here I am trying to find the intersection between the 2 columns coauthors and nacoauthors using the following code

interscout = 
  sample_test %>% 
  mutate( commonauth = intersect( coauthors, nacoauthors) )

and I get this output enter image description here I am not sure why I am not able to get the common intersection set using intersect in mutate.

Ideally, the last row should be empty and the second row should have only JAMES M ANDERSON on intersection.

Here is the code for the structure.

> dput(sample_test)
structure(list(fname = c("JACK", "JACK", "JACK"), lname = c("SMITH", 
"SMITH", "SMITH"), cname = c("JACK  SMITH", "JACK A SMITH", "JACK B SMITH"
), coauthors = list(c("AMEY S BAILEY", "JAMES M ANDERSON"), "JAMES M ANDERSON", 
    "JOHN MURRAY"), nacoauthors = list(c("AMEY S BAILEY", "JAMES M ANDERSON"
), c("AMEY S BAILEY", "JAMES M ANDERSON"), c("AMEY S BAILEY", 
"JAMES M ANDERSON"))), row.names = c(NA, -3L), vars = list(fname, 
    lname), drop = TRUE, indices = list(0:2), group_sizes = 3L, biggest_group_size = 3L, labels = structure(list(
    fname = "JACK", lname = "SMITH"), class = "data.frame", row.names = c(NA, 
-1L), vars = list(fname, lname), drop = TRUE, .Names = c("fname", 
"lname")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), .Names = c("fname", "lname", "cname", "coauthors", "nacoauthors"
))
Dinesh
  • 2,194
  • 3
  • 30
  • 52
  • it's going to throw an error because mutate is looking for something of equivalent output length to the full dataset. you could use intersect outside of dply `intersect(sample_test$coauthors, sample_test$nacoauthors)` and it should work – B Williams May 09 '17 at 00:48

1 Answers1

3

If you add rowwise() and make your mutated column a list it'll work:

interscout <- sample_test %>%
    ungroup() %>%
    rowwise() %>%
    mutate( commonauth = list( intersect(coauthors, nacoauthors) ) )

FWIW If I don't include rowwise() I get Error: not compatible with STRSXP

Nate
  • 10,361
  • 3
  • 33
  • 40
  • Thanks! whats the role of ungroup and rowwise here? – Dinesh May 09 '17 at 01:47
  • `ungroup()` might not be necessary but its a precaution since your `dput` described a "groupped_df". If a grouping was in effect for your `data_frame`, `mutate()` would work within that grouping framework instead of comparing each row separately and that's likely not what you'd want here. The `rowwise()` tells `mutate()` to consider each row by itself, which is what you want to make in-row comparisons, like this. – Nate May 09 '17 at 10:59