Select first row per run by group

Question

I have data with a grouping variable (ID) and some values (type):

ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type <- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")

dat <- data.frame(ID,type)

Within each ID, I want to delete the repeated number, not the unique one but the one the same as the previous one. I have annotated some examples:

#     ID type
#  1   1    1
#  2   1    3 # first value in a run of 3s within ID 1: keep 
#  3   1    3 # 2nd value: remove  
#  4   1    2
#  5   2    3
#  6   2    3
#  7   2    1
#  8   2    1
#  9   3    1
# 10   3    2 # first value in a run of 2s within ID 3: keep
# 11   3    2 # 2nd value: remove
# 12   3    1

For example, ID 3 have the sequence of values 1, 2, 2, 1. The third value is the same as the second value, so it should be deleted, to become 1,2,1

Thus, the desired output is:

data.frame(ID = c("1", "1", "1", "2", "2", "3", "3", "3"),
           type = c("1", "3", "2", "3", "1", "1", "2", "1"))

  ID type
1  1    1
2  1    3
3  1    2
4  2    3
5  2    1
6  3    1
7  3    2
8  3    1

I've tried

 df[!duplicated(df), ]

however what I got was

ID <- c("1", "1", "1", "2", "2", "3", "3")
type<- c("1", "3", "2", "3", "1", "1", "2")

I know duplicated would only keep the unique one. how can I get the values I want?

Thanks for the help in advance!

score 2 · Answer 1 · answered May 22 '21 at 13:48

2

Does this work:

library(dplyr)
dat %>% group_by(ID) %>% 
   mutate(flag = case_when(type == lag(type) ~ TRUE, TRUE ~ FALSE)) %>% 
     filter(!flag) %>% select(-flag)
# A tibble: 8 x 2
# Groups:   ID [3]
  ID    type 
  <chr> <chr>
1 1     1    
2 1     3    
3 1     2    
4 2     3    
5 2     1    
6 3     1    
7 3     2    
8 3     1

answered May 22 '21 at 13:48

Karthik S

11,348
2
11
25

Yes, it works :) I'm trying to learn the way you think about it. – Astor May 22 '21 at 14:33

Ronak Shah · Accepted Answer · 2021-05-22T13:34:25.973

1

Using data.table rleid and duplicated -

library(data.table)
setDT(dat)[!duplicated(rleid(ID, type))]

#   ID type
#1:  1    1
#2:  1    3
#3:  1    2
#4:  2    3
#5:  2    1
#6:  3    1
#7:  3    2
#8:  3    1

Improved answer including suggestion from @Henrik.

edited May 22 '21 at 13:34

answered May 22 '21 at 13:20

Ronak Shah

377,200
20
156
213

score 1 · Answer 3 · answered May 22 '21 at 13:29

Base R way If you want to eliminate consecutive duplicate rows only (8 rows output)

ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type<- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")

dat <- data.frame(ID,type)

subset(dat, !duplicated(with(rle(paste(dat$ID, dat$type)), rep(seq_len(length(lengths)), lengths))))
#>    ID type
#> 1   1    1
#> 2   1    3
#> 4   1    2
#> 5   2    3
#> 7   2    1
#> 9   3    1
#> 10  3    2
#> 12  3    1

^{Created on 2021-05-22 by the reprex package (v2.0.0)}

Select first row per run by group

3 Answers3