R group repeating values

Question

If I am dealing with a dataset like this

  Id     Index    Value
  1233   i1       Blue
  1233   i2       Blue
  1233   i3       Blue
  6545   i1       Red
  6545   i2       NA
  6545   i3       Black
  4177   i1       NA
  4177   i2       NA 
  4177   i2       NA

How do I create a new dataset by retaining only one instance of repeating values for Id like 1233 and 4177 like this below.

  Id     Index    Value
  1233   i        Blue
  6545   i1       Red
  6545   i2       NA
  6545   i3       Black
  4177   i        NA

akrun · Accepted Answer · 2020-03-20T19:33:04.280

2

We can use distinct

library(dplyr)
distinct(df1, Id, Value, .keep_all = TRUE)
#    Id Index Value
#1 1233    i1  Blue
#2 6545    i1   Red
#3 6545    i2  <NA>
#4 6545    i3 Black
#5 4177    i1  <NA>

Or using base R

df1[!duplicated(df1[c('Id', 'Value')]),]

data

df1 <- structure(list(Id = c(1233L, 1233L, 1233L, 6545L, 6545L, 6545L, 
4177L, 4177L, 4177L), Index = c("i1", "i2", "i3", "i1", "i2", 
"i3", "i1", "i2", "i2"), Value = c("Blue", "Blue", "Blue", "Red", 
NA, "Black", NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-9L))

edited Mar 20 '20 at 19:33

answered Mar 20 '20 at 19:22

akrun

874,273
37
540
662

1

nice with `duplicated`! +1 – ThomasIsCoding Mar 20 '20 at 19:32
@ThomasIsCoding thanks, it is a nice option using row index `+1` – akrun Mar 20 '20 at 19:38

score 1 · Answer 2 · answered Mar 20 '20 at 19:31

Maybe unique + rownames can help you

df[as.numeric(rownames(unique(df[-2]))),]

such that

    Id Index Value
1 1233    i1  Blue
4 6545    i1   Red
5 6545    i2  <NA>
6 6545    i3 Black
7 4177    i1  <NA>

DATA

df <- structure(list(Id = c(1233L, 1233L, 1233L, 6545L, 6545L, 6545L, 
4177L, 4177L, 4177L), Index = c("i1", "i2", "i3", "i1", "i2", 
"i3", "i1", "i2", "i2"), Value = c("Blue", "Blue", "Blue", "Red", 
NA, "Black", NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-9L))

score 0 · Answer 3 · answered Mar 20 '20 at 19:54

You could use data.table package and the by argument of its unique method:

library(data.table)
unique(setDT(df), by = c("Id", "Value"))
#       Id  Index  Value
# 1:  1233     i1   Blue
# 2:  6545     i1    Red
# 3:  6545     i2   <NA>
# 4:  6545     i3  Black
# 5:  4177     i1   <NA>

R group repeating values

3 Answers3

data