View all rows where there is a duplicate in one of the columns in R

Asked Nov 05 '20 at 10:46

Active Nov 05 '20 at 10:54

Viewed 24 times

I want to view all columns of all rows which contain a duplicate in one of the variables.

Col_1 = c(1,1,1,2,3,4,4,4,4,5)
Col_2 = c("a","b","c","a","b","c","a","b","c", "a")
df = data.frame(Col_1, Col_2)

I have identified the values of Col_1 that are duplicated.

dup= df%>%
  group_by(Col_1)%>%
  count(Col_1)%>%
  filter(n > 1)%>%
  ungroup()

I have turned this into a set of integers and put into a View() function:

dup_id = dup[['Col_1']]

View(df[df$Col_1 == dup_id,])

I'd expect the output to contain all rows where Col_1 is 1 or 4 but instead I'm just shown 4 rows:

(df[df$Col_1 == dup_id,])
#>   Col_1 Col_2
#> 1     1     a
#> 3     1     c
#> 6     4     c
#> 8     4     b

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

Why is this code not showing me all relevant rows?

edited Nov 05 '20 at 10:54

asked Nov 05 '20 at 10:46

Mark Davies

2

`df[df$Col_1 %in% dup_id,]` should to the job! – holzben Nov 05 '20 at 10:57
I was just suggesting what @holzben suggested; it works! – Fabio Marroni Nov 05 '20 at 10:58
see also: https://stackoverflow.com/questions/42637099/difference-between-the-and-in-operators-in-r – holzben Nov 05 '20 at 11:08
Thank you. I was just looking at https://stackoverflow.com/questions/42637099/difference-between-the-and-in-operators-in-r , trying to understand! – Mark Davies Nov 05 '20 at 11:11

0 Answers0