1

I am trying to understand the difference between match and intersect in R. Both return the same output in a different format. Are there any functional differences between both?

match(names(set1), names(set2))
#  [1] NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11

intersect(names(set1), names(set2))
# [1] "Year"     "ID"
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
goutam
  • 657
  • 2
  • 13
  • 35
  • http://www.rexamples.com/12/Match() More variations in match Id say. Considering the use of nomatch and incomparables. http://www.endmemo.com/program/R/match.php – Hack-R Nov 24 '16 at 19:58

1 Answers1

3

match(a, b) returns an integer vector of length(a), with the i-th element giving the position j such that a[i] == b[j]. NA is produced by default for no_match (although you can customize it).

If you want to get the same result as intersect(a, b), use either of the following:

b[na.omit(match(a, b))]
a[na.omit(match(b, a))]

Example

a <- 1:5
b <- 2:6

b[na.omit(match(a, b))]
# [1] 2 3 4 5

a[na.omit(match(b, a))]
# [1] 2 3 4 5

I just wanted to know if there any other differences between the both. I was able to understand the results myself.

Then we read source code

intersect
#function (x, y) 
#{
#    y <- as.vector(y)
#    unique(y[match(as.vector(x), y, 0L)])
#}

It turns out that intersect is written in terms of match!

Haha, looks like I forgot the unique in the outside. Em, by setting nomatch = 0L we can also get rid of na.omit. Well, R core is more efficient than my guess.


Follow-up

We could also use

a[a %in% b]  ## need a `unique`, too
b[b %in% a]  ## need a `unique`, too

However, have a read on ?match. In "Details" we can see how "%in%" is defined:

"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

So, yes, everything is written using match.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • I just wanted to know if there any other differences between the both. I was able to understand the results myself. Thank you! – goutam Nov 24 '16 at 20:07