3

I am leveraging the code below to partial match with 1 match but have a follow up question: supposed we had an additional criteria for fish, and we wanted "dog fish" to be categorized as both fish and canine. Is this possible?

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                 "black panther", "short cat", "red bird",
                 "short bird stuffed", "big eagle", "bad sparrow",
                 "dog fish", "head dog", "brown yorkie",
                 "lab short bulldog"), label=1:14)

Define the regexes at the beginning of the code

regexes <- list(c("(cat|lion|tiger|panther)","feline"),
            c("(bird|eagle|sparrow)","avian"),
            c("(dog|yorkie|bulldog)","canine"))

Create a vector, the same length as the df

output_vector <- character(nrow(d))

For each regex..

for(i in seq_along(regexes)){

#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]} 

Insert that now-filled output vector into the dataframe

d$species <- output_vector

Desired Output

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine, fish
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

The original stack overflow question is here: partial string matching r

Community
  • 1
  • 1
user3743201
  • 53
  • 1
  • 6

1 Answers1

3

I'd do through a cross join.

library(dplyr)
library(stringi)

key = data_frame(partial = c("cat", "lion", "tiger", "panther",
                             "bird", "eagle", "sparrow",
                             "dog", "yorkie", "bulldog"),
                  category = c("feline", "feline", "feline", "feline",
                               "avian", "avian", "avian",
                               "canine", "canine", "canine"))

d %>%
  merge(key) %>%
  filter(name %>% stri_detect_fixed(partial) )
bramtayl
  • 4,004
  • 2
  • 11
  • 18