0

My data looks like this

id<-c(100,100,100,105,105,105,105,105,112,112,112,112,112,112)
age<-c(17,21,25,21,22,27,32,39,20,27,28,30,31,35)
status<-c('clear','warning','offend','clear','warning','warning','warning','offend','clear','warning','offend','clear','warning','offend')

mydata<-data.frame(id,age,status)

I want to filter the rows of those individuals who FIRST offended after only ONE warning, which should be only id 100 at age 25, and id 112 at age 28.

My attempt is the following

mydata %>% 
  group_by(id) %>% 
  filter(!(duplicated(status) & status=='warning')) %>% 
  slice(match('offend',status))

which gives me id 105 at age 39 too, which is incorrect, because s/he has been warned more than once before.

I tried to create a dummy eg group_by(id,status) %>% mutate(dummy=1:n()) to take into account the number of times someone has been warned, and then filter if they've only been warned once, but that doesn't work, because that would not give me id 112, who has been warned, offended (this row should be captured) then warned and offended again which gives this person a value of 2 under dummy.

kna123
  • 115
  • 8

1 Answers1

1

Does this work:

mydata %>% group_by(id) %>% 
  mutate(flag = cumsum(status == 'warning')) %>% 
    filter(status == 'offend' & lag(status) == 'warning' & flag == 1) %>% select(-flag)
# A tibble: 2 x 3
# Groups:   id [2]
     id   age status
  <dbl> <dbl> <chr> 
1   100    25 offend
2   112    28 offend
Karthik S
  • 11,348
  • 2
  • 11
  • 25