0

I am trying to extract rows from my R dataframe where the ID column has the same value and the pt column has different values. For example, if my data frame looks like this:

ID    pt
600   DC90
600   DC90
612   DC18
612   DC02
612   DC02
630   DC30
645   DC16
645   DC16
645   DC16

my desired output would look like this:

ID    pt
612   DC18
612   DC02
612   DC02

because ID 612 has two different pt numbers

2 Answers2

2

We could group over the ID, and filter IDs where the number of distinct elements in 'pt' is greater than 1

library(dplyr)
df1 %>%
    group_by(ID) %>%
    filter(n_distinct(pt) > 1)

-output

# A tibble: 3 x 2
# Groups:   ID [1]
#     ID pt   
#  <int> <chr>
#1   612 DC18 
#2   612 DC02 
#3   612 DC02 

if it is to check all elements should be different

df1 %>%
    group_by(ID) %>%
    filter(n_distinct(pt) == n())

data

df1 <- structure(list(ID = c(600L, 600L, 612L, 612L, 612L, 630L, 645L, 
645L, 645L), pt = c("DC90", "DC90", "DC18", "DC02", "DC02", "DC30", 
"DC16", "DC16", "DC16")), class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

A data.table option using uniqueN, grouped by ID

> setDT(df)[, .SD[uniqueN(pt) > 1], ID]
    ID   pt
1: 612 DC18
2: 612 DC02
3: 612 DC02

Data

> dput(df)
structure(list(ID = c(600L, 600L, 612L, 612L, 612L, 630L, 645L,
645L, 645L), pt = c("DC90", "DC90", "DC18", "DC02", "DC02", "DC30",
"DC16", "DC16", "DC16")), class = "data.frame", row.names = c(NA,
-9L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81