How to find matches among two list of names

Question

I have two long vectors of names (list.1, list.2). I want to run a loop to check whether any name in list.2 matches with any name in list.1. If it does I want to append to a vector result the value for the position of the matching name in vector list.1.

 for (i in list.2){
  for (j in list.1){
    if(length(grep(list.2[i], list.1[j]), ignore.case=TRUE)==0){
      append(result, j)
      break
    } else append(nameComment.corresponding, 0)
  }
}

The above code is really brute-force and since my vectors are 5,000 and 60,000 name long, it will probably run for over 360,000,000 cycles. How could I improve it?

Was the suggestion below useful? If an answer does solve your problem you may want to *consider* upvoting and/or marking it as accepted to show the question has been answered, by ticking the little green check mark next to the suitable answer. You are **not** obliged to do this, but it helps keep the site clean of unanswered questions and rewards those who take the time to solve your problem. — Simon O'Hanlon, Aug 13 '13 at 08:46
This is totally what the set-operation `intersect` is for... and in your case wrap it with `match(intersect(list.1, list.2), list.1)`. Don't ever write O(N1*N2) loops... — smci, May 04 '18 at 05:43

Simon O'Hanlon · Accepted Answer · 2013-08-10T09:46:21.690

3

which and %in% would probably be good for this task, or match depending on what you are going for. A point to note is that match returns the index of the first match of it's first argument in it's second argument (that is to say if you have multiple values in the lookup table only the first match to that will be returned):

set.seed(123)
#  I am assuming these are the values you want to check if they are in the lookup 'table'
list2 <- sample( letters[1:10] , 10 , repl = T )
[1] "c" "h" "e" "i" "j" "a" "f" "i" "f" "e"

#  I am assuming this is the lookup table
list1 <- letters[1:3]
[1] "a" "b" "c"

#  Find which position in the lookup table each value is, NA if no match
match(list2 , list1 )
[1]  3 NA NA NA NA  1 NA NA NA NA

edited Aug 10 '13 at 09:46

answered Aug 10 '13 at 07:31

Simon O'Hanlon

58,647
14
142
184

+1. For completeness, there is also `match(y, x)` (which works a bit differently) and `which(is.element(x, y))` (where `is.element` is identical to `%in%` but *might* be more... intuitive?... to a new user because the function name is a little more descriptive than `%in%`). – A5C1D2H2I1M1N2O1R2T1 Aug 10 '13 at 08:14
@AnandaMahto thanks. This is actually a terrible example (IMHO!) but at the time my little boy was pestering me! I'll re-jig it slightly – Simon O'Hanlon Aug 10 '13 at 09:41

score 1 · Answer 2 · answered May 04 '18 at 05:32

This is totally what the set-operations intersect/union/setdiff() are for:

list.1 = c('Alan','Bill','Ted','Alice','Carol')
list.2 = c('Carol','Ted')
intersect(list.1, list.2)
 "Ted" "Carol"

...or if you really want the indices into list.1:

match(intersect(list.1, list.2), list.1)
  3 5

How to find matches among two list of names

2 Answers2