2

I have data with a grouping variable (ID) and some values (type):

ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type <- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")

dat <- data.frame(ID,type)

Within each ID, I want to delete the repeated number, not the unique one but the one the same as the previous one. I have annotated some examples:

#     ID type
#  1   1    1
#  2   1    3 # first value in a run of 3s within ID 1: keep 
#  3   1    3 # 2nd value: remove  
#  4   1    2
#  5   2    3
#  6   2    3
#  7   2    1
#  8   2    1
#  9   3    1
# 10   3    2 # first value in a run of 2s within ID 3: keep
# 11   3    2 # 2nd value: remove
# 12   3    1

For example, ID 3 have the sequence of values 1, 2, 2, 1. The third value is the same as the second value, so it should be deleted, to become 1,2,1

Thus, the desired output is:

data.frame(ID = c("1", "1", "1", "2", "2", "3", "3", "3"),
           type = c("1", "3", "2", "3", "1", "1", "2", "1"))

  ID type
1  1    1
2  1    3
3  1    2
4  2    3
5  2    1
6  3    1
7  3    2
8  3    1

I've tried

 df[!duplicated(df), ]

however what I got was

ID <- c("1", "1", "1", "2", "2", "3", "3")
type<- c("1", "3", "2", "3", "1", "1", "2")

I know duplicated would only keep the unique one. how can I get the values I want?

Thanks for the help in advance!

Henrik
  • 65,555
  • 14
  • 143
  • 159
Astor
  • 37
  • 5

3 Answers3

2

Does this work:

library(dplyr)
dat %>% group_by(ID) %>% 
   mutate(flag = case_when(type == lag(type) ~ TRUE, TRUE ~ FALSE)) %>% 
     filter(!flag) %>% select(-flag)
# A tibble: 8 x 2
# Groups:   ID [3]
  ID    type 
  <chr> <chr>
1 1     1    
2 1     3    
3 1     2    
4 2     3    
5 2     1    
6 3     1    
7 3     2    
8 3     1   
Karthik S
  • 11,348
  • 2
  • 11
  • 25
1

Using data.table rleid and duplicated -

library(data.table)
setDT(dat)[!duplicated(rleid(ID, type))]

#   ID type
#1:  1    1
#2:  1    3
#3:  1    2
#4:  2    3
#5:  2    1
#6:  3    1
#7:  3    2
#8:  3    1

Improved answer including suggestion from @Henrik.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Base R way If you want to eliminate consecutive duplicate rows only (8 rows output)

ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type<- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")

dat <- data.frame(ID,type)

subset(dat, !duplicated(with(rle(paste(dat$ID, dat$type)), rep(seq_len(length(lengths)), lengths))))
#>    ID type
#> 1   1    1
#> 2   1    3
#> 4   1    2
#> 5   2    3
#> 7   2    1
#> 9   3    1
#> 10  3    2
#> 12  3    1

Created on 2021-05-22 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45