0

I have an event log in the format below.

Original format
I have created groups by DATE and ID using dplyr,hence a change in either date or ID will be taken as a different group.

I want to have only events that are >= 5secs time interval and remove the rest. Desired output

I have used dplyr and time lag to acheive this,since I am not able to dynamically assign a lag interval for this. But my current code checks for one lag interval and I end up deleting more rows than desired.Current output - all rows in yellow are removed. Ideally I would want "13:10:22", "13:10:24" in group 2 to be retained since the time lag from "13:10:17" to these times is 5 secs and more.

I am using "chron" to handle the times. I understand the time lag logic will not work in mycase. Could there be a better alternative apart from using an expensive for/if loop.

Code I have used

data$Date <- as.Date(data$Date,format = "%m/%d/%Y")  
data$Time <- chron(times = data$Time)  

data <- data  %>% arrange(Date,Time,ID)    
data$Group <- data %>%  group_by(Date,ID) %>% group_indices    
data <- data %>%     
        group_by(Group)  %>%       
        mutate(time.difference = Time - lag(Time)) %>%    
        filter(time.difference >= 0.00005787 | is.na(time.difference))  

Dput of the data

structure(list(Date = structure(c(17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17470, 17470, 17470, 17470), class = "Date"), Time = structure(c(0.936400462962963, 0.9425, 0.9425, 0.942511574074074, 0.942523148148148, 0.9703125, 0.548518518518519, 0.548530092592593, 0.54880787037037, 0.54880787037037, 0.548819444444444, 0.548842592592593, 0.548865740740741, 0.548888888888889, 0.557337962962963, 0.6140625, 0.618761574074074, 0.618958333333333, 0.622303240740741), format = "h:m:s", class = "times"), ID = c("P1", "P1", "P1", "P1", "P1", "P1", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P9", "P9", "P9", "P9")), .Names = c("Date", "Time", "ID"), row.names = c(NA, -19L), class = "data.frame")

Vinds
  • 35
  • 1
  • 8

2 Answers2

1
library(dplyr)
data %>%
  group_by(Group) %>%
  arrange(Group, Date, Time) %>% 
  filter((Time - lag(Time)) >= 5.787037e-05 | row_number() == 1L)
Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
  • thanks.. but this doesnt solve my problem of including "13:10:22", "13:10:24" from group2. I am not looking at immediate time lags. GIven an event I dont want to include the next n events that are less than 4 secs. I want to include only events that are >= 5 secs. Moreover the above code doesnt include the first element of a group where the time lag would return an NA. Would something like a skip logic work here – Vinds Mar 06 '18 at 04:35
  • thanks..Wish I got to see this a little earlier.. I used a different logic and solved this.. first i combined the date and time as posixct object and then did some calucations.. attaching my code – Vinds Mar 06 '18 at 11:07
1
data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S")  
data$group <-  data %>% group_by(ID,by5sec=cut(datetime, breaks="5 sec")) %>%  group_indices
data_filter <- data %>% group_by(group) %>% filter(row_number()==1)

I did this in 2 steps since I wanted the intermediate results with the group indices to be written onto a CSV.

Vinds
  • 35
  • 1
  • 8