2

I have a data frame called animals containing different sized columns that have some common and uncommon elements among each other as shown below:

Dog     Cat      Lion     Dog
Cat     Lion     Dog      Shark
Lion    Dog      Shark    Cat
Shark   Shark    Cat      Lion
        Whale    Seal     Moose
        Seal              Whale
                          Deer

What I want to do is identify all the common elements within each column, exclude the uncommon elements and combine the common elements into one column like this:

Dog
Cat
Lion
Shark

So far I've tried identifying the duplicated elements using duplicated(animals) and then extract the duplicated elements using animals[duplicated(animals)] but this gives no results. Does anyone have a better method?

Dswede43
  • 351
  • 1
  • 8

2 Answers2

3

We can use intersect

Reduce(intersect, animals)
#[1] "Dog"   "Cat"   "Lion"  "Shark"

Or can use tidyverse

library(dplyr)
library(tidyr)
pivot_longer(animals, cols = everything(), values_drop_na = TRUE) %>% 
     group_by(value) %>% 
     filter(n_distinct(name) == ncol(animals)) %>% 
     ungroup %>% 
     distinct(value)
# A tibble: 4 x 1
#  value
#  <chr>
#1 Dog  
#2 Cat  
#3 Lion 
#4 Shark

data

animals <- structure(list(v1 = c("Dog", "Cat", "Lion", "Shark", NA, NA, 
NA), v2 = c("Cat", "Lion", "Dog", "Shark", "Whale", "Seal", NA
), v3 = c("Lion", "Dog", "Shark", "Cat", "Seal", NA, NA), v4 = c("Dog", 
"Shark", "Cat", "Lion", "Moose", "Whale", "Deer")), 
    class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Another base R option using stack + table + rowSums

> names(which(rowSums(table(na.omit(stack(animals)))) == ncol(animals)))
[1] "Cat"   "Dog"   "Lion"  "Shark"

Below we break down the code into steps

> stack(animals)
   values ind
1     Dog  v1
2     Cat  v1
3    Lion  v1
4   Shark  v1
5    <NA>  v1
6    <NA>  v1
7    <NA>  v1
8     Cat  v2
9    Lion  v2
10    Dog  v2
11  Shark  v2
12  Whale  v2
13   Seal  v2
14   <NA>  v2
15   Lion  v3
16    Dog  v3
17  Shark  v3
18    Cat  v3
19   Seal  v3
20   <NA>  v3
21   <NA>  v3
22    Dog  v4
23  Shark  v4
24    Cat  v4
25   Lion  v4
26  Moose  v4
27  Whale  v4
28   Deer  v4

> na.omit(stack(animals))
   values ind
1     Dog  v1
2     Cat  v1
3    Lion  v1
4   Shark  v1
8     Cat  v2
9    Lion  v2
10    Dog  v2
11  Shark  v2
12  Whale  v2
13   Seal  v2
15   Lion  v3
16    Dog  v3
17  Shark  v3
18    Cat  v3
19   Seal  v3
22    Dog  v4
23  Shark  v4
24    Cat  v4
25   Lion  v4
26  Moose  v4
27  Whale  v4
28   Deer  v4

> table(na.omit(stack(animals)))
       ind
values  v1 v2 v3 v4
  Cat    1  1  1  1
  Deer   0  0  0  1
  Dog    1  1  1  1
  Lion   1  1  1  1
  Moose  0  0  0  1
  Seal   0  1  1  0
  Shark  1  1  1  1
  Whale  0  1  0  1

> rowSums(table(na.omit(stack(animals))))
  Cat  Deer   Dog  Lion Moose  Seal Shark Whale
    4     1     4     4     1     2     4     2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81