How do I select all rows where one column has the same value but another column has a different values?

Question

I am trying to extract rows from my R dataframe where the ID column has the same value and the pt column has different values. For example, if my data frame looks like this:

ID    pt
600   DC90
600   DC90
612   DC18
612   DC02
612   DC02
630   DC30
645   DC16
645   DC16
645   DC16

my desired output would look like this:

ID    pt
612   DC18
612   DC02
612   DC02

because ID 612 has two different pt numbers

score 2 · Answer 1 · answered Jan 04 '21 at 23:32

We could group over the ID, and filter IDs where the number of distinct elements in 'pt' is greater than 1

library(dplyr)
df1 %>%
    group_by(ID) %>%
    filter(n_distinct(pt) > 1)

-output

# A tibble: 3 x 2
# Groups:   ID [1]
#     ID pt   
#  <int> <chr>
#1   612 DC18 
#2   612 DC02 
#3   612 DC02

if it is to check all elements should be different

df1 %>%
    group_by(ID) %>%
    filter(n_distinct(pt) == n())

data

df1 <- structure(list(ID = c(600L, 600L, 612L, 612L, 612L, 630L, 645L, 
645L, 645L), pt = c("DC90", "DC90", "DC18", "DC02", "DC02", "DC30", 
"DC16", "DC16", "DC16")), class = "data.frame", row.names = c(NA, 
-9L))

score 2 · Answer 2 · answered Jan 05 '21 at 00:00

A data.table option using uniqueN, grouped by ID

> setDT(df)[, .SD[uniqueN(pt) > 1], ID]
    ID   pt
1: 612 DC18
2: 612 DC02
3: 612 DC02

Data

> dput(df)
structure(list(ID = c(600L, 600L, 612L, 612L, 612L, 630L, 645L,
645L, 645L), pt = c("DC90", "DC90", "DC18", "DC02", "DC02", "DC30",
"DC16", "DC16", "DC16")), class = "data.frame", row.names = c(NA,
-9L))

How do I select all rows where one column has the same value but another column has a different values?

2 Answers2

data