0

I have two character vectors a, b with different dimensions. I have to take each element in a and compare with all elements in b and note the element if there is a close match. For matching I'm using agrepl function.

Following is the sample data

a <- c("US","Canada","United States","United States of America")
b <- c("United States","U.S","United States","Canada", "America", "Spain")

Following is the code that I'm using to match. Please help me how to avoid for loop as my real data has more 900 and 5000 records respectively

for(i in 1:4)
{
    for(j in 1:6)
    {
      bFlag <- agrepl(a[i],b[j],  max.distance = 0.1,ignore.case = TRUE)

      if(bFlag)
      {
        #Custom logic
      }
      else 
      {
        #Custom logic
      }
    }
}
Naveen
  • 53
  • 2
  • 8

1 Answers1

0

You don't need a double loop, since agrepl's second argument accepts vectors of length >= 1. So you could do something like:

lapply(a, function(x) agrepl(x, b, max.distance = 0.1, ignore.case = TRUE))
# [[1]]
# [1]  TRUE  TRUE  TRUE FALSE FALSE  TRUE
# 
# [[2]]
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE
# 
# [[3]]
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE
# 
# [[4]]
# [1] FALSE FALSE FALSE FALSE FALSE FALSE

You can add some custom logic inside the lapply call if needed, but that's not specified in the question so I'll just leave the output as a list of logicals.

If you want indices (of TRUEs) instead of logicals, you can use agrep instead of agrepl:

lapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE))

# [[1]]
# [1] 1 2 3 6
# 
# [[2]]
# [1] 4
# 
# [[3]]
# [1] 1 3
# 
# [[4]]
# integer(0)

If you only want the first TRUE index, you can use:

sapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE)[1])
#  US                   Canada            United States United States of America 
#   1                        4                        1                       NA 
talat
  • 68,970
  • 21
  • 126
  • 157
  • Thank you...!!! I'm expecting a corresponding index of the element in vector b if its true. First true index is sufficient – Naveen Jul 06 '16 at 13:54
  • @Naveen, if you want indices, just use `agrep` instead of `agrepl` in the example – talat Jul 06 '16 at 14:09