1

It is hard to explain, but I have a data.frame (2.915 rows) with ID column and years columns:

enter image description here

In this data.frame I want to verify if any ID has the sequence 1,0 (one or more years with zero),1* in the columns.

Like this:

enter image description here

I mean, a flag column would be useful.

data$flag[data$ID %in% seq] <- c("Y")

How can I do it?

My data:

structure(list(ID = c("5453", "6675", "7745", 
"68621", "33356", "7855"), `2000` = c(0, 
0, 1, 0, 1, 0), `2001` = c(0, 0, 1, 0, 1, 0), `2002` = c(0, 0, 
1, 0, 1, 0), `2003` = c(0, 0, 1, 0, 1, 0), `2004` = c(1, 0, 1, 
0, 1, 1), `2005` = c(0, 1, 1, 0, 1, 1), `2006` = c(1, 1, 1, 0, 
1, 1), `2007` = c(1, 1, 1, 0, 1, 1), `2008` = c(1, 1, 1, 1, 1, 
1), `2009` = c(1, 1, 1, 1, 1, 1), `2010` = c(1, 1, 1, 1, 0, 1
), `2011` = c(1, 1, 1, 0, 0, 1), `2012` = c(1, 1, 1, 0, 0, 1), 
    `2013` = c(1, 1, 1, 0, 0, 1), `2014` = c(1, 1, 1, 0, 0, 1
    ), `2015` = c(1, 1, 1, 0, 0, 1), `2016` = c(1, 1, 1, 0, 0, 
    1), `2017` = c(0, 0, 0, 0, 0, 0), `2018` = c(0, 0, 0, 0, 
    0, 0)), row.names = c(NA, 6L), class = "data.frame")
RxT
  • 486
  • 7
  • 17

2 Answers2

1

With your given data as df:

library(dplyr)
library(tidyr)

flag_valley <- function(x) {
  num_changes <- sum(abs(diff(x)))
  first_val <- x[1]
  return(num_changes >= 3 | num_changes >= 2 & first_val == 1)
}

df %>%
  pivot_longer(matches("^\\d{4}$"), names_to = "year", values_to = "val") %>%
  group_by(ID) %>%
  summarize(flag = flag_valley(val))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 6 x 2
#>   ID    flag 
#>   <chr> <lgl>
#> 1 33356 FALSE
#> 2 5453  TRUE 
#> 3 6675  FALSE
#> 4 68621 FALSE
#> 5 7745  FALSE
#> 6 7855  FALSE

The flag_valley function looks to see if the series changes 3 or more times (in which case there must be a '1(0+)1' pattern) or if the series starts at 1 and changes 2 or more times (in which case the pattern must also occur).

Note that this will only work when the only possible values are 0 and 1.

RyanFrost
  • 1,400
  • 7
  • 17
1

We can use lead to get the next value

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = -ID) %>%
  group_by(ID) %>%
  summarise(flag = any(value == 1 & lead(value) == 0 & lead(value, 2) == 1))

# A tibble: 6 x 2
#  ID    flag 
#  <chr> <lgl>
#1 33356 FALSE
#2 5453  TRUE 
#3 6675  FALSE
#4 68621 FALSE
#5 7745  FALSE
#6 7855  FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213