2

I am trying to use agrep command for fuzzy matching. I have a data frame in which one column contains the audience response and another dataframe in which segment and subsegment are listed. the column audience response contains the words that are the name of the subsegment. For example:

pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"          
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"      
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"        
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"                           
[6] "(Old) AddThis - UK » Health » Social » Health Influencers" 

Similarly I have another data frame called x that conatins the segment and sub-segment

x$segment               x$subsegment
Shopping                Financial shoppers
Travel                  Travel Europe
Shopping                Christmas shopping

I want to write a function that does the fuzzy matching between pattern$Audience and x$subsegment and returns the subsegment for each of the audience response in a new column as pattern$subseg

The resulting data set I need should be like this:

pattern$audience    x$segment               x$subsegment                
[1] "(Deleted) Semasio » DE: Intent » Christmas C"            Shopping                Christmas shopping              
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"                         
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"                           
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"   Shopping                Financial shoppers              
[5] "(Old) AddThis - UK » Food » Social"                                              
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"                  

Here's the code that I tried to write but it is not returning me the desired output:

x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))


n.match <- function(pattern, x, ...) {
  for (i in 1:nrow(pattern)) {
        x <- (agrep(y,pattern$audience[i],
                 ignore.case=TRUE, value = TRUE))
              x <- paste0(x,"")
              pattern$subseg[i] <- x
  }
  head(pattern)
    }

Can someone please help me correct my mistake. I would really appreciate your answer. Many thanks

Shaz
  • 25
  • 4

1 Answers1

0

We could try this:

pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",          
         "(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
         "(Old) AddThis - UK » Auto » General » Auto Intenders",        
         "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
         "(Old) AddThis - UK » Food » Social",
         "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
         "(Old) AddThis - UK » Health » Social » Health Influencers")
pattern <- data.frame(audiance=pattern)
x <- read.csv(text='segment,   subsegment    
                       Shopping,   Financial shoppers
                       Travel,     Travel Europe
                       Enthusiasts, Auto Enthusiasts  
                       Shopping,   Christmas shopping', stringsAsFactors=FALSE)

vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
pattern$subsegment <- ''
matches <- vagrep(x$subsegment, pattern$audiance)
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))

pattern
#                                                         audiance            subsegment
#1                  (Deleted) Semasio » DE: Intent » Christmas C                      
#2       (Old) AddThis - UK » Auto » General » Auto Enthusiasts    Auto Enthusiasts  
#3         (Old) AddThis - UK » Auto » General » Auto Intenders                      
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
#5                            (Old) AddThis - UK » Food » Social                      
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
#7    (Old) AddThis - UK » Health » Social » Health Influencers                      
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63