2

I have a data frame

structure(list(Time = structure(c(1531056854, 1531057121, 1517382101, 
1517386850, 1517386951, 1517399987, 1517400523, 1517400523), class = c("POSIXct", 
"POSIXt")), Data = c("Start", "Exit", "Start", "Start", "Exit", 
"Start", "Exit", "Exit"), same = c(0, 0, 1, 0, 0, 0, 1, NA)), class = "data.frame", .Names = c("Time", 
"Data", "same"), row.names = c(NA, -8L))

The ideal scenario for column 2 is to have a Start followed by an Exit.

However, in some instances, I could have a Start``Start and Exit or a Start followed by Exit``Exit. I tried to identify the subsequent starts and exits through this code:

library(dplyr)
df <- df %>% mutate(same = ifelse(Data == lead(Data), 1, 0))

This provides me with the following output:

                  Time  Data same
1 2018-07-08 19:04:14 Start    0
2 2018-07-08 19:08:41  Exit    0
3 2018-01-31 12:31:41 Start    1
4 2018-01-31 13:50:50 Start    0
5 2018-01-31 13:52:31  Exit    0
6 2018-01-31 17:29:47 Start    0
7 2018-01-31 17:38:43  Exit    1
8 2018-01-31 17:38:43  Exit   NA

I am trying to figure out how do I identify the second Start if there are two Start in a sequence in and the first Exit if there are two Exit in a sequence with a marker of 1. The desired output is as follows:

                  Time  Data same
1 2018-07-08 19:04:14 Start    0
2 2018-07-08 19:08:41  Exit    0
3 2018-01-31 12:31:41 Start    0
4 2018-01-31 13:50:50 Start    1 #this should be one
5 2018-01-31 13:52:31  Exit    0
6 2018-01-31 17:29:47 Start    0
7 2018-01-31 17:38:43  Exit    1 #this should be one
8 2018-01-31 17:38:43  Exit    0

I tried using an if condition within a ifelse, but it had gone messy.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Apricot
  • 2,925
  • 5
  • 42
  • 88

2 Answers2

4
library(tidyverse)
df %>% 
  mutate( same2 = ifelse( Data == "Start" & lag( Data ) == Data, 1, 0 )) %>%
  mutate( same2 = ifelse( Data == "Exit" & lead( Data ) == Data, 1, same2 ) )

#                  Time  Data same same2
# 1 2018-07-08 15:34:14 Start    0    NA
# 2 2018-07-08 15:38:41  Exit    0     0
# 3 2018-01-31 08:01:41 Start    1     0
# 4 2018-01-31 09:20:50 Start    0     1
# 5 2018-01-31 09:22:31  Exit    0     0
# 6 2018-01-31 12:59:47 Start    0     0
# 7 2018-01-31 13:08:43  Exit    1     1
# 8 2018-01-31 13:08:43  Exit   NA    NA
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • 2
    I think no need for two statements you can use `|` as `Data=='Start' & lag(Data)=='Start' | Data=='Exit' & lead(Data)=='Exit'` – A. Suliman Aug 30 '18 at 09:19
  • @Wimpel Thank you...I didn't know that I can use mutate twice in a sequence....this works. I was working through a ifelse inside a if and else condition, which never worked. – Apricot Aug 30 '18 at 09:21
  • 1
    @Apricot you can also follow A.Sulliman's suggestion, and combine the two checks with an "or" statement. – Wimpel Aug 30 '18 at 09:22
1

We could coerce the logical to binary with as.integer

df %>% 
    mutate(same2 = as.integer((Data == 'Start' & lag(Data) == Data)|
                              (Data == 'Exit' &  lead(Data) == Data)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you...I think you would have influenced almost every aspiring R user with your answers....thank you again. – Apricot Aug 30 '18 at 15:40