What is the difference between `%in%` and `==`?

Question

df <- structure(list(x = 1:10, time = c(0.5, 0.5, 1, 2, 3, 0.5, 0.5, 
1, 2, 3)), .Names = c("x", "time"), row.names = c(NA, -10L), class = "data.frame")


df[df$time %in% c(0.5, 3), ]
##     x time
## 1   1  0.5
## 2   2  0.5
## 5   5  3.0
## 6   6  0.5
## 7   7  0.5
## 10 10  3.0

df[df$time == c(0.5, 3), ]
##     x time
## 1   1  0.5
## 7   7  0.5
## 10 10  3.0

What is the difference between %in% and == here?

you might be interested in [video number #033](http://www.twotorials.com/) — Anthony Damico, Mar 12 '13 at 13:24

score 32 · Accepted Answer · edited Mar 01 '21 at 15:08

32

The problem is vector recycling.

Your first line does exactly what you'd expect. It checks what elements of df$time are in c(0.5, 3) and returns the values which are.

Your second line is trickier. It's actually equivalent to

df[df$time == rep(c(0.5,3), length.out=nrow(df)),]

To see this, let's see what happens if use a vector rep(0.5, 10):

rep(0.5, 10) == c(0.5, 3)
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

See how it returns every odd value. Essentially it's matching 0.5 to the vector c(0.5, 3, 0.5, 3, 0.5...)

You can manipulate a vector to produce no matches this way. Take the vector: rep(c(3, 0.5), 5):

rep(c(3, 0.5), 5) == c(0.5, 3)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

They're all FALSE; you are matching every 0.5 with 3 and vice versa.

edited Mar 01 '21 at 15:08

Mus

7,290
24
86
130

answered Mar 12 '13 at 10:01

sebastian-c

15,057
3
47
93

1

get it always use `%in%` unless I am comparing one unique value OR actually intending to use recycling, very clear thanks + – user1320502 Mar 12 '13 at 10:13
2

@user1320502 Actually, there are some advantages to using %in% even when you have one value. Try `x <- c(1:5, rep(NA, 3)); x[x==3]` and compare that to `x[x%in%3]`. – sebastian-c Mar 13 '13 at 05:20

score 15 · Answer 2 · answered Mar 12 '13 at 10:01

In

df$time == c(0.5,3)

the c(0.5,3) first gets broadcast to the shape of df$time, i.e. c(0.5,3,0.5,3,0.5,3,0.5,3,0.5,3). Then the two vectors are compared element-by-element.

On the other hand,

df$time %in% c(0.5,3)

checks whether each element of df$time belongs to the set {0.5, 3}.

score 8 · Answer 3 · answered Mar 12 '19 at 16:37

8

This is an old thread, but I haven't seen this answer anywhere and it might be relevant for some people.

Another difference between the two is handling of NAs (missing values).

NA == NA
[1] NA
NA %in% c(NA)
[1] TRUE

answered Mar 12 '19 at 16:37

mochi

121
1
4

What is the difference between `%in%` and `==`?

3 Answers3

Linked

Related