1

Lets say I have the following data:

library(tidyverse)
cf <- data.frame(x = c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c", "d", "d", "e", "e"), 
            y = c("free", "with", "sus", "sus", "free", "with", "free", "sus", "sus", "with", "free", "sus", "free", "free", "with", "sus"))
> cf
   x    y
1  a free
2  a with
3  a  sus
4  a  sus
5  b free
6  b with
7  b free
8  b  sus
9  c  sus
10 c with
11 c free
12 c  sus
13 d free
14 d free
15 e with
16 e  sus

I want to obtain an indicator variable equal to 1 if the last value of y by group is equal to sus and the last value before sus was with. There can be a multiple sus before with occurs as in group a. If free occurs last first before with then its zero.

Desired output:

   x    y sus_with
1  a free        0
2  a with        0
3  a  sus        0
4  a  sus        1
5  b free        0
6  b with        0
7  b free        0
8  b  sus        0
9  c  sus        0
10 c with        0
11 c free        0
12 c  sus        0
13 d free        0
14 d free        0
15 e with        0
16 e  sus        1 

I was thinking something like this:

cf %>%
  group_by(x) %>%
  mutate(sus_with = if_else(row_number() == n() & y == "sus" & last(y == "with" & y != "free"), 1, 0))

refs:

How to find the date after the last occurrence of a certain observation in R?

How to find the last occurrence of a certain observation in grouped data in R?

any suggestions please?

user63230
  • 4,095
  • 21
  • 43

1 Answers1

3

We can group_by x and check if last value of y is "sus" and the last value which is not "sus" is "with" and assign value 1 for the last row of the group.

library(dplyr)

cf %>%
  group_by(x) %>%
  mutate(sus_with = as.integer(last(y) == 'sus' & last(y[y != "sus"]) == 'with' 
                     & row_number() == n()))

#    x     y     sus_with
#   <fct> <fct>    <int>
# 1 a     free         0
# 2 a     with         0
# 3 a     sus          0
# 4 a     sus          1
# 5 b     free         0
# 6 b     with         0
# 7 b     free         0
# 8 b     sus          0
# 9 c     sus          0
#10 c     with         0
#11 c     free         0
#12 c     sus          0
#13 d     free         0
#14 d     free         0
#15 e     with         0
#16 e     sus          1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thanks. I run into problems if one group only has one row and it has `sus` in it. For example if you rerun the code with the first 3 rows removed `cf <- cf[-c(1:3), ]` I get: `Error: Column sus_with must be length 1 (the group size), not 0`. Would you know a quick workaround please? – user63230 Sep 26 '19 at 11:24
  • 1
    @user63230 yes, maybe check for a condition with `if`, `cf %>% group_by(x) %>% mutate(sus_with = if (any(y != 'sus') & n() > 1) as.integer(last(y) == 'sus' & last(y[y != "sus"]) == 'with' & row_number() == n()) else 0) ` – Ronak Shah Sep 26 '19 at 11:28