0

This is similar to a question I asked recently but is different. Lets say I have the following data:

library(tidyverse)
    df <- structure(list(x = c("a", "a", "a", "a", "b", "b", "b", "b", 
                               "b", "c", "c", "c", "c", "d", "d", "e", "e", "f", "g", "g", "g", 
                               "g", "g", "g", "g", "g"), y = c(" free", " with", "  sus", "  sus", 
                                                               "  sus", " free", " free", "  sus", " free", " with", " sus", 
                                                               " free", "  sus", " free", " free", " with", "  sus", "  sus", 
                                                               " free", " sus", " sus", " sus", " sus", " free", " sus", " free"
                               ), indicator = c(0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 
                                                0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0)), row.names = c(NA, -26L), class = c("tbl_df", 
                                                                                                                      "tbl", "data.frame"))
    df
       x     y       indicator
       <chr> <chr>       <dbl>
     1 a     " free"         0
     2 a     " with"         0
     3 a     "  sus"         1
     4 a     "  sus"         0
     5 b     "  sus"         1
     6 b     " free"         0
     7 b     " free"         0
     8 b     "  sus"         1
     9 b     " free"         0
    10 c     " with"         0
    11 c     " sus"          1
    12 c     " free"         0
    13 c     "  sus"         1
    14 d     " free"         0
    15 d     " free"         0
    16 e     " with"         0
    17 e     "  sus"         1
    18 f     "  sus"         1
    19 g     " free"         0
    20 g     " sus"          0
    21 g     " sus"          1
    22 g     " sus"          0
    23 g     " sus"          0
    24 g     " free"         0
    25 g     " sus"          1
    26 g     " free"         0

I want to create a variable where if indicator==1, I search previous and subsequent rows by grouping variable x and give it a value equal to 1 if the next and previous occurrence that isn't sus is equal to free. So, if theres a with before the next or last free, then it doesn't get a value of 1. If indicator==1 and it is on the last or first row of the group, then I assume with isn't on the next or previous row e.g. group b, c, e. My desired output is:

   x     y       indicator newvariable
   <chr> <chr>       <dbl>       <dbl>
 1 a     " free"         0           0
 2 a     " with"         0           0
 3 a     "  sus"         1           0
 4 a     "  sus"         0           0
 5 b     "  sus"         1           1
 6 b     " free"         0           0
 7 b     " free"         0           0
 8 b     "  sus"         1           1
 9 b     " free"         0           0
10 c     " with"         0           0
11 c     " sus"          1           0
12 c     " free"         0           0
13 c     "  sus"         1           1
14 d     " free"         0           0
15 d     " free"         0           0
16 e     " with"         0           0
17 e     "  sus"         1           0
18 f     "  sus"         1           1
19 g     " free"         0           0
20 g     " sus"          0           0
21 g     " sus"          1           1
22 g     " sus"          0           0
23 g     " sus"          0           0
24 g     " free"         0           0
25 g     " sus"          1           1
26 g     " free"         0           0

I want something flexible that can cycle through many rows (there could be many sus before a free and there could be multiple indicator==1 per group as in group g). Something like the following is what I was thinking but I want the lag and lead to look across many previous and subsequent rows:

df %>% 
 group_by(x) %>% 
  mutate(newvariable = as.integer(indicator == 1 & lag(y[y != "sus"]) =='free' & lead(y[y != "sus"]) == 'free' ))
    #taken idea from previous question
    #mutate(newvariable = as.integer(last(y) == 'sus' & last(y[y != "sus"]) == 'with')

I don't think I can do the same approach as my last question for last but looking for something similar if anyone had ideas please? Maybe pmap?

user63230
  • 4,095
  • 21
  • 43

1 Answers1

1

This seems to do the work. The trick I used is to build a temporary data frame where sequences of consecutive identical y values are collapsed into one record.

df = df %>% 
    group_by(x) %>% 
    do({
        # subgroup is constant in series of records with constant x and y
        df_x = mutate(., i=row_number(), subgroup=cumsum(y!=lag(y, default="")))

        df_subgroups = df_x %>% 
            distinct(subgroup, y) %>% 
            mutate(
                prev_distinct_y = lag(y, default="free"),
                next_distinct_y = lead(y, default="free")
            )
        df_x = df_x %>% left_join(df_subgroups)
        df_x %>% mutate(fixed_indicator = 0 + (indicator==1 & prev_distinct_y!=" with" & next_distinct_y!=" with") )
    })
Pierre Gramme
  • 1,209
  • 7
  • 23
  • nice! I had to remove `do` command and merge separately as it was too slow on my large dataset. can you explain `0 + (indicator==1 & prev_distinct_y!=" with" & next_distinct_y!=" with")` I havent seen that before. It seems like an `ifelse` statement: `ifelse(indicator == 1 & prev_distinct_y!=" with" & next_distinct_y!=" with", 1, 0)` – user63230 Oct 14 '19 at 12:48
  • 1
    Indeed, it is a lazy way of writing this `if_else`. Would probably be a bit cleaner to use `as.numeric()` instead – Pierre Gramme Oct 14 '19 at 12:55