is there an R function to filter a dataset in 15 secs interval?

Question

I am trying to filter a dataset of depth data and a corresponding date. The Time column is a POSIXct format = "%Y%m%d%H%M%S". This is how my data looks like:

Depth Time        
0.1   2018-06-24 01:26:40  
0.2   2018-06-24 01:26:41  
0.2   2018-06-24 01:26:56  
0.1   2018-06-24 01:26:57  
0.1   2018-06-24 01:26:58  
0.1   2018-06-24 01:26:59  
0.1   2018-06-24 01:27:14  
0.1   2018-06-24 01:27:15  
0.1   2018-06-24 01:27:16  
0.1   2018-06-24 01:27:17  
0.1   2018-06-24 01:27:30

I would like to create a dataframe containing the same data but only with a new entry every 15 seconds. My data is sometimes continuous and sometimes there is a gap in the recorded time.

This would be my desired output:

Depth Time
0.2 2018-06-24 01:26:41
0.2 2018-06-24 01:26:56
0.1 2018-06-24 01:27:14
0.1 2018-06-24 01:27:30

I tried using the time difference between rows which works for the parts of the dataset that is consecutive:

    dt_filter <- d_cor %>%  
    mutate(diff = Time - lag(Time, default = first(Time)))  

     if ((dt_filter$diff < 2) ) {  
      dt_filter_1 <- dt_filter[seq(1, nrow(dt_filter), 15), ]  
     }

but that gives me a problem once I try to add the ones in that are not consecutive:

    dt_filter_15 <- dt_filter %>%    
      filter(diff >= 15 )

Since then I don't always have an interval of 15 secs and it obviously doesn't count them in.

So far I could not find a function that is able to filter my Time column. I am quite new to R so not too familiar with writing my own loops which I guess is neccessary...and time data does not make it easier.

Thanks for any help!

EDIT

@Ben Thanks for the quick replies!

That is some of the output that I've got:

        Depth Time                diff cumdiff x
        0.1   2018-06-23 23:59:44 1    1030    0
        0.0   2018-06-24 00:01:02 78   1035    5
        0.0   2018-06-24 00:01:03 1    1036    1

between the last two lines is only 1s of difference, but it is still added to cumdiff and therefore counted in the x column

Hello, it would be fine if you could provide a reproductible code (you may use `dput` for your exemple dataset). Because no one wants to copy by hand all the data you gave .. — MrSmithGoesToWashington, Sep 02 '19 at 17:09

score 1 · Answer 1 · edited Sep 04 '19 at 03:19

1

Thanks to your links I found that function which seems to be working perfectly - just in case anyone has the same question:

issecsApart <- function(d_cor) {
  secs <- 0
  keeps <- c()
  for (d in d_cor) {
    if (d >= secs + 15) {
      secs <- d
      keeps <- c(keeps, TRUE)
    } else {
      keeps <- c(keeps, FALSE)
    }
  }
  keeps
}

d_cor2 <- d_cor %>% 
  arrange(Time) %>%
  filter(issecsApart(Time))

Appreciate all your help!

edited Sep 04 '19 at 03:19

Ben

28,684
5
23
45

answered Sep 03 '19 at 08:13

Ninaninanska

13
4

Looks good - I think you want to arrange by Time before filtering and calling the issecsApart function correct? (just edited) – Ben Sep 04 '19 at 03:19

Ben · Accepted Answer · 2019-09-04T02:56:55.357

One potential solution is with dplyr - though I am sure there may be better options available especially with data.table. As suggested by @42- and demonstrated by @Maurits Evers, you can do the following:

library(dplyr)

d_cor %>%
  arrange(Time) %>%
  mutate(
    diff = abs(lag(Time) - Time),
    diff = ifelse(is.na(diff), 0, diff),
    cumdiff = cumsum(diff) %/% 15,
    x = abs(lag(cumdiff) - cumdiff)) %>%
  filter(is.na(x) | x > 0) %>%
  select(Depth, Time)

  Depth                Time
1   0.1 2018-06-24 01:26:40
2   0.2 2018-06-24 01:26:56
3   0.1 2018-06-24 01:27:14
4   0.1 2018-06-24 01:27:30

diff will include the difference between times in seconds between consecutive rows. The first row would be NA (changed to 0).

cumdiff is the cumulative sum of diff but after modulo division by 15 (cumdiff increases by 1 after at least every 15 seconds).

The filter will include the first row (x = NA) and additional rows where cumdiff changes (at rows where at least 15 seconds lapsed).

Other examples that may be helpful and include data.table:

Filter rows by a time threshold

Subset observations that differ by at least 30 minutes time

Subset time series so that selected rows differs by a certain minimum time

Edit: This solution looks for times in fixed 15 second windows. There are problems related to diffs greater than 15. For those cases, it does not 'reset' and start a new 15 second window. Instead, it would include that time no matter what 15 second window it was in. Because of this we could potentially find times close to each other especially right afterwards.

Thank you guys so much! I was thinking about cumsum but couldn't work out how to incorporate the 15! That worked pretty well on most of the data but it seems that the higher diffs are troubling it. — Ninaninanska, Sep 02 '19 at 19:01
@Ninaninanska can you share example data where this may not be working? What results are you getting instead? — Ben, Sep 02 '19 at 19:02
@Ninaninanska Thanks for additional info - I will follow up shortly (still editing). Glad you found a solution! — Ben, Sep 03 '19 at 20:08

is there an R function to filter a dataset in 15 secs interval?

2 Answers2