2

I have checked the questions but I couldn't find any explanation.

So I have two vectors and I only want to choose the different elements that one has and the other hasn't.

I defined to vectors hypothetically like:

hypo1=c("a01","a02","a03","a04","b01","b02","b03","b04","c01","c02","c03","c04")
hypo2=c("a03","a04","b01","b02","c02","c03")

And then wanted to see the duplicates in these two vectors.

intersect(hypo1,hypo2)
[1] "a03" "a04" "b01" "b02" "c02" "c03"

which seems like working fine.

However, when I wanted to see the unique elements which hypo1 has and hypo2 hasn't, I got the all elements restored in the first vector. such as:

unique(hypo1,hypo2)
 [1] "a01" "a02" "a03" "a04" "b01" "b02" "b03" "b04" "c01" "c02" "c03" "c04"

Then I've changed the order of the vectors I've created and it gave the intersect command result, like

unique(hypo2,hypo1)
[1] "a03" "a04" "b01" "b02" "c02" "c03"

I did some digging on the web but I couldn't find what I'm missing. I need to find unique elements which one data has and other hasn't.

pogibas
  • 27,303
  • 19
  • 84
  • 117
DSA
  • 655
  • 5
  • 13

2 Answers2

3

You want setdiff(hypo2, hypo1). unique(hypo2, hypo1) means something completely different: it means you want the unique entries in hypo2, but will allow values to be duplicated if they are listed in hypo1. This is explained on the help page ?unique.

For example,

unique(c(1,2,2,3,3,4,4,4), c(3,4))

gives

[1] 1 2 3 3 4 4 4

because 3 and 4 have been declared to be "incomparables". On the other hand,

setdiff(c(1,2,2,3,3,4,4,4), c(3,4))

gives

[1] 1 2

which is what I think you were looking for.

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • and that's the thing that Ive completely forgotten about it since I have no repetition both in my original data and the data that I created for this question. Thanks a lot. – DSA Oct 05 '17 at 10:13
3

Unique allows only one vector as argument x. A second vector will be used as the argument incomparables. From ?unique we learn that these values

will never be marked as duplicated. This is intended to be used for a fairly small set of values and will not be efficient for a very large set.

One method to extract overlapping / differing values is:

hypo1[!hypo1 %in% hypo2]
# [1] "a01" "a02" "b03" "b04" "c01" "c04"
hypo1[hypo1 %in% hypo2]
# [1] "a03" "a04" "b01" "b02" "c02" "c03"

As setdiff has the same result as the first line, benchmarks are necessary for an appropriately sized data set to show performance differences.

loki
  • 9,816
  • 7
  • 56
  • 82