1

I have a dataframe/tibble containing yearly observations of several countries. In years in which a specific event happens the variable event gets the value 1.

I am now trying to specify a new column event.10yrs which gets the value 1 for the 9 years following the end of an event (= last year of the event if event lasts several years). In years in which a new event occurs and which are not the last year of the new event, the new column event.10yrs gets the value 0.

Below the data for one single country. Column event.10yrs is the desired output.

 df <-structure(list(year = c(1970, 1971, 1972, 1973, 1974, 1975, 1976, 
1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 
2010, 2011, 2012, 2013, 2014, 2015), ccode = c(516, 516, 516, 
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 
516, 516, 516, 516), event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA), event.last.y = c(0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, NA, 
NA, NA, NA, NA), event.10yrs = c(NA, 0, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, NA, NA, NA)), row.names = c(NA, 
-46L), vars = "ccode", drop = TRUE, class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), indices = list(0:45), group_sizes = 46L, biggest_group_size = 46L, labels = structure(list(
    ccode = 516), row.names = c(NA, -1L), vars = "ccode", drop = TRUE, class = "data.frame", .Names = "ccode"), .Names = c("year", 
"ccode", "event", "event.last.y", "event.10yrs"))

My attempt so far using the dplyr package:

df <- df %>%
  mutate(event.10yrs=case_when(event!=1 & year-9 < year[event.last.y==1] ~ 1,
                               TRUE ~ 0))

This, however, renders the following warning:

Warning message:
In year < year[rs.war.last.y == 1] :
  longer object length is not a multiple of shorter object length

Grateful for any hint.

zoowalk
  • 2,018
  • 20
  • 33

1 Answers1

1

Maybe just a nested ifelse (or dplyr::if_else)

require(dplyr)
df %>% mutate(ev_10 = if_else(event == 0, 1, 
                           if_else(event.last.y ==1, 1, 0), 
                                                           0))

edit

this post helped me here : Find the index position of the first non-NA value in an R vector?

But we want to replace not only for the first occurrence of 'x' ...
so I made a little workaround with a helper column

index_1 <- unlist(lapply(which(df$event.last.y ==1 ), 
           function(x) seq(x, length.out=9))) 
# this makes a vector with all the index of the last 9 positions 
# after the last value == 1 
df$last_code <- df$event.last.y #just to duplicate your column
df$last_code[index_1] <- 1 #replacing the indices with '1'

Now we can use the simple nested conditional statement as before

df <- df %>% mutate(ev_10 = if_else(event == 0 & last_code==1, 1, 
#added the condition that last_code needs to be '1'
                          if_else(event.last.y ==1, 1, 0), 
                          0))

 head(df[c(2:13, 31:40),], 20) #printing only example rows here
# A tibble: 20 x 7
# Groups:   ccode [1]
    year ccode event event.last.y event.10yrs last_code ev_10
   <dbl> <dbl> <dbl>        <dbl>       <dbl>     <dbl> <dbl>
 1  1971   516  0            0           0         0     0   
 2  1972   516  1.00         1.00        1.00      1.00  1.00
 3  1973   516  0            0           1.00      1.00  1.00
 4  1974   516  0            0           1.00      1.00  1.00
 5  1975   516  0            0           1.00      1.00  1.00
 6  1976   516  0            0           1.00      1.00  1.00
 7  1977   516  0            0           1.00      1.00  1.00
 8  1978   516  0            0           1.00      1.00  1.00
 9  1979   516  0            0           1.00      1.00  1.00
10  1980   516  0            0           1.00      1.00  1.00
11  1981   516  0            0           1.00      0     0   
12  1982   516  0            0           0         0     0  
... 
13  2000   516  1.00         0           0         1.00  0   
14  2001   516  1.00         0           0         1.00  0   
15  2002   516  1.00         0           0         1.00  0   
16  2003   516  1.00         1.00        1.00      1.00  1.00
17  2004   516  0            0           1.00      1.00  1.00
18  2005   516  0            0           1.00      1.00  1.00
19  2006   516  0            0           1.00      1.00  1.00
20  2007   516  0            0           1.00      1.00  1.00
tjebo
  • 21,977
  • 7
  • 58
  • 94
  • many thanks! the new variable (ev_10 in your exaple) should get the value 1 only for the 9 years following the the event; not all the way until a new event occurs (at max; if a new event occurs before 9 years, ev_10 gets the value 1 for less than 9 years). – zoowalk Mar 01 '18 at 09:25
  • 1
    just saw, it does not entirely correspond to your requested result - maybe I misunderstood what you meant with the last 9 years after the event.. You might want to change the `length.out` argument in the `which` call – tjebo Mar 01 '18 at 10:39