I am leveraging the code below to partial match with 1 match but have a follow up question: supposed we had an additional criteria for fish, and we wanted "dog fish" to be categorized as both fish and canine. Is this possible?
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
"black panther", "short cat", "red bird",
"short bird stuffed", "big eagle", "bad sparrow",
"dog fish", "head dog", "brown yorkie",
"lab short bulldog"), label=1:14)
Define the regexes at the beginning of the code
regexes <- list(c("(cat|lion|tiger|panther)","feline"),
c("(bird|eagle|sparrow)","avian"),
c("(dog|yorkie|bulldog)","canine"))
Create a vector, the same length as the df
output_vector <- character(nrow(d))
For each regex..
for(i in seq_along(regexes)){
#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]}
Insert that now-filled output vector into the dataframe
d$species <- output_vector
Desired Output
# name label species
#1 brown cat 1 feline
#2 blue cat 2 feline
#3 big lion 3 feline
#4 tall tiger 4 feline
#5 black panther 5 feline
#6 short cat 6 feline
#7 red bird 7 avian
#8 short bird stuffed 8 avian
#9 big eagle 9 avian
#10 bad sparrow 10 avian
#11 dog fish 11 canine, fish
#12 head dog 12 canine
#13 brown yorkie 13 canine
#14 lab short bulldog 14 canine
The original stack overflow question is here: partial string matching r