How to find elements common in at least 2 vectors?

Question

Say I have 5 vectors:

a <- c(1,2,3)
b <- c(2,3,4)
c <- c(1,2,5,8)
d <- c(2,3,4,6)
e <- c(2,7,8,9)

I know I can calculate the intersection between all of them by using Reduce() together with intersect(), like this:

Reduce(intersect, list(a, b, c, d, e))
[1] 2

But how can I find elements that are common in, say, at least 2 vectors? i.e.:

[1] 1 2 3 4 8

flodel · Accepted Answer · 2014-10-03T11:18:33.607

It is much simpler than a lot of people are making it look. This should be very efficient.

Put everything into a vector:
```
x <- unlist(list(a, b, c, d, e))
```

Look for duplicates

unique(x[duplicated(x)])
# [1] 2 3 1 4 8

and sort if needed.

Note: In case there can be duplicates within a list element (which your example does not seem to implicate), then replace x with x <- unlist(lapply(list(a, b, c, d, e), unique))

Edit: as the OP has expressed interest in a more general solution where n >= 2, I would do:

which(tabulate(x) >= n)

if the data is only made of natural integers (1, 2, etc.) as in the example. If not:

f <- table(x)
names(f)[f >= n]

This is now not too far from James solution but it avoids the costly-ish sort. And it is miles faster than computing all possible combinations.

Nice one. Can this be generalised for n > 2? As in, how do I find elements common in at least n vectors? — enricoferrero, Oct 03 '14 at 10:41
No, it would require I use a frequency table via `table` or `tabulate`, see my edit. — flodel, Oct 03 '14 at 11:19

johannes · Answer 2 · 2014-10-03T08:35:29.947

2

You could try all possible combinations, for example:

## create a list
l <- list(a, b, c, d)

## get combinations
cbn <- combn(1:length(l), 2)

## Intersect them 
unique(unlist(apply(cbn, 2, function(x) intersect(l[[x[1]]], l[[x[2]]]))))
## 2 3 1 4

edited Oct 03 '14 at 08:35

answered Oct 03 '14 at 08:27

johannes

14,043
5
40
51

Can you please explain what is the first argument to combn() (1:4)? – enricoferrero Oct 03 '14 at 08:34
1

I changed it to `length(l)` which is more generic. It creates all possible combination of n elements when you choose k. – johannes Oct 03 '14 at 08:37

score 2 · Answer 3 · answered Oct 03 '14 at 08:35

Here's another option:

# For each vector, get a vector of values without duplicates
deduplicated_vectors <- lapply(list(a,b,c,d,e), unique)

# Flatten the lists, then sort and use rle to determine how many
# lists each value appears in
rl <- rle(sort(unlist(deduplicated_vectors)))

# Get the values that appear in two or more lists
rl$values[rl$lengths >= 2]

Sven Hohenstein · Answer 4 · 2014-10-03T08:37:46.267

0

This is an approach that counts the number of vectors each unique value occurs in.

unique_vals <- unique(c(a, b, c, d, e))

setNames(rowSums(!!(sapply(list(a, b, c, d, e), match, x = unique_vals)),
                 na.rm = TRUE), unique_vals)
# 1 2 3 4 5 8 6 7 9 
# 2 5 3 2 1 2 1 1 1

edited Oct 03 '14 at 08:37

answered Oct 03 '14 at 08:26

Sven Hohenstein

80,497
17
145
168

jbaums · Answer 5 · 2014-10-03T15:44:23.993

0

Yet another approach, applying a vectorised function with outer:

L <- list(a, b, c, d, e)
f <- function(x, y) intersect(x, y)
fv <- Vectorize(f, list("x","y"))
o <- outer(L, L, fv)
table(unlist(o[upper.tri(o)]))

#  1  2  3  4  8 
#  1 10  3  1  1

The output above gives the number of pairs of vectors that share each of the duplicated elements 1, 2, 3, 4, and 8.

edited Oct 03 '14 at 15:44

answered Oct 03 '14 at 09:07

jbaums

27,115
5
79
119

score 0 · Answer 6 · answered Oct 03 '14 at 09:54

0

A variation of @rengis method would be:

unique(unlist(Map(`intersect`, cbn[1,], cbn[2,])))
#[1] 2 3 1 4 8

where,

l <- mget(letters[1:5])
cbn <- combn(l,2)

answered Oct 03 '14 at 09:54

akrun

874,273
37
540
662

score 0 · Answer 7 · answered Jun 15 '21 at 13:32

When the vector is huge, solutions like duplicated or tabulate might overflow your system. In that case, dplyr comes in handy with the following code

library(dplyr) combination_of_vectors <- c(a, b, c, d, e)
#For more than 1 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>1)
#For more than 2 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>2)
#For more than 3 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>2)

Hope it helps somebody

How to find elements common in at least 2 vectors?

7 Answers7

Linked

Related