1

I want to dummy-code whether some string is contained in another (which is structured). For example:

player <- c("Michael Jordan", "Steve Kerr", "Michael Jordan", "Toni Kukoc")

bulls <- c("Jordan, Michael Jeffrey", "Pippen, Scottie; Harper, Ron",
           "Rodman, Dennis", "Kerr, Steve; Longley, Luc; Kukoc, Toni")

and create a new variable (say, included) if words Michael and Jordan are present in bulls[1], Steve Kerr in bulls[2] etc. The above should produce TRUE FALSE FALSE TRUE. For generality, names and surnames are separated by commas, whereas a semicolon indicates multiple people in a single entry. Given that the object bulls can feature longer versions of a name ("Jeffrey" in this case) but not the other way around, I suspect the solution might require some sort of an is.element check? I want to iterate this over a long list, what is the best approach?

p.s. I tried several stringr verbs, however no luck so far (_view, _extract etc.)

1 Answers1

3

Try this:

require(stringr)
mapply(function(x,y) all(x %in% y),
    str_extract_all(player,"\\w+"),str_extract_all(bulls,"\\w+"))
#[1]  TRUE FALSE FALSE  TRUE
nicola
  • 24,005
  • 3
  • 35
  • 56