7

I have a two variable dataframe one of which is a character vector. Each row in "MyVector" contains a string with exactly one name (i.e. "Pete"). The name can vary in its location in the character string. I want to create code that will match the name in a list with the name in the character string and extract that name into a new variable in the dataframe. If the name was always in the same position in the vector "MyVector", I would create a new variable as a substring of MyVector pulling out the name into a new column. I tried various version of str_detect from Stringr to no avail.

Challenge: How do I detect or extract the name into a new variable and place it into MyDF if the name is in multiple positions?

#Create the data frame
var.1 <-rep(c(1,5,3),2)

MyVector <- c("I know Pete", "Jerry has a new job","Victor is an employee","How to work with Pete","Too Many Students","Bob is mean")
   MyDF <-as.data.frame(cbind(var.1,MyVector))

#Create a vector of a list of names I want to extract into a new column in the dataframe.
Extract <- c("Jerry","Pete", "Bob", "Victor")

#Match would be perfect if I could use it on character vectors
MyDF$newvar <-match(MyDF$MyVector,Extract)

My final data.frame should look something like the output below.

 var.1                     MyVector NEWVAR
1     1               Don knows Pete   Pete
2     5          Jerry has a new job  Jerry
3     3 Victor and Bob are employees Victor
4     1        How to work with Pete   Pete
5     5            Too Many Students     NA
6     3                  Bob is mean    Bob
RareAir
  • 135
  • 1
  • 2
  • 5

1 Answers1

9

We can use str_extract after pasteing the 'Extract' together

library(stringr)
MyDF$NEWVAR <- str_extract(MyDF$MyVector, paste(Extract, collapse="|"))
MyDF$NEWVAR
#[1] "Pete"   "Jerry"  "Victor" "Pete"   NA       "Bob"   
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 4
    How would you go about extracting two words from a string? For example, row 3 in the OP's example contains Victor and Bob but only Victor is returned in your answer. Thanks. – Seanosapien Oct 13 '17 at 21:31
  • 1
    @Seanosapien `str_extract` extracts only the first match. For multiple extraction, use `str_extract_all` – akrun May 05 '20 at 17:39
  • Thanks @akrun. How is the first match determined? Why is Victor chosen over Bob? – Seanosapien May 07 '20 at 09:15
  • 1
    It comes. before Bob in the sentence – akrun May 07 '20 at 18:26