4

I would like to filter a dataframe based on its date column. I would like to keep the rows where I have at least 3 consecutive days. I would like to do this as effeciently and quickly as possible, so if someone has a vectorized approached it would be good.

I tried to inspire myself from the following link, but it didn't really go well, as it is a different problem:

How to filter rows based on difference in dates between rows in R?

I tried to do it with a for loop, I managed to put an indicator on the dates who are not consecutive, but it didn't give me the desired result, because it keeps all dates that are in a row even if they are less than 3 in a row.

tf is my dataframe

for(i in 2:(nrow(tf)-1)){

  if(tf$Date[i] != tf$Date[i+1] %m+% days(-1)){
    if(tf$Date[i] != tf$Date[i-1] %m+% days(1)){
      tf$Date[i] = as.Date(0)
    }
  }

}

The first 22 rows of my dataframe look something like this:

         Date RR.x RR.y    Y
1  1984-10-20    1 10.8 1984
2  1984-11-04    1 12.5 1984
3  1984-11-05    1  7.0 1984
4  1984-11-09    1 22.9 1984
5  1984-11-10    1 24.4 1984
6  1984-11-11    1 19.0 1984
7  1984-11-13    1  5.9 1984
8  1986-10-15    1 10.3 1986
9  1986-10-16    1 18.1 1986
10 1986-10-17    1 11.3 1986
11 1986-11-17    1 14.1 1986
12 2003-10-17    1  7.8 2003
13 2003-10-25    1  7.6 2003
14 2003-10-26    1  5.0 2003
15 2003-10-27    1  6.6 2003
16 2003-11-15    1 26.4 2003
17 2003-11-20    1 10.0 2003
18 2011-10-29    1 10.0 2011
19 2011-11-04    1 11.4 2011
20 2011-11-21    1  9.8 2011
21 2011-11-22    1  5.6 2011
22 2011-11-23    1 20.4 2011

The result should be:

         Date RR.x RR.y    Y
4  1984-11-09    1 22.9 1984
5  1984-11-10    1 24.4 1984
6  1984-11-11    1 19.0 1984
8  1986-10-15    1 10.3 1986
9  1986-10-16    1 18.1 1986
10 1986-10-17    1 11.3 1986
13 2003-10-25    1  7.6 2003
14 2003-10-26    1  5.0 2003
15 2003-10-27    1  6.6 2003
20 2011-11-21    1  9.8 2011
21 2011-11-22    1  5.6 2011
22 2011-11-23    1 20.4 2011
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
meteoeliot
  • 43
  • 4

2 Answers2

3

One possibility could be:

df %>%
 mutate(Date = as.Date(Date, format = "%Y-%m-%d"),
        diff = c(0, diff(Date))) %>%
 group_by(grp = cumsum(diff > 1 & lead(diff, default = last(diff)) == 1)) %>%
 filter(if_else(diff > 1 & lead(diff, default = last(diff)) == 1, 1, diff) == 1) %>%
 filter(n() >= 3) %>%
 ungroup() %>%
 select(-diff, -grp)

   Date        RR.x  RR.y     Y
   <date>     <int> <dbl> <int>
 1 1984-11-09     1  22.9  1984
 2 1984-11-10     1  24.4  1984
 3 1984-11-11     1  19    1984
 4 1986-10-15     1  10.3  1986
 5 1986-10-16     1  18.1  1986
 6 1986-10-17     1  11.3  1986
 7 2003-10-25     1   7.6  2003
 8 2003-10-26     1   5    2003
 9 2003-10-27     1   6.6  2003
10 2011-11-21     1   9.8  2011
11 2011-11-22     1   5.6  2011
12 2011-11-23     1  20.4  2011
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
3

Here's a base solution:

DF$Date <- as.Date(DF$Date)

rles <- rle(cumsum(c(1,diff(DF$Date)!=1)))
rles$values <- rles$lengths >= 3

DF[inverse.rle(rles), ]

        Date RR.x RR.y    Y
4  1984-11-09    1 22.9 1984
5  1984-11-10    1 24.4 1984
6  1984-11-11    1 19.0 1984
8  1986-10-15    1 10.3 1986
9  1986-10-16    1 18.1 1986
10 1986-10-17    1 11.3 1986
13 2003-10-25    1  7.6 2003
14 2003-10-26    1  5.0 2003
15 2003-10-27    1  6.6 2003
20 2011-11-21    1  9.8 2011
21 2011-11-22    1  5.6 2011
22 2011-11-23    1 20.4 2011

Similar approach in dplyr

DF%>%
  mutate(Date = as.Date(Date))%>%
  add_count(IDs = cumsum(c(1, diff(Date) !=1)))%>%
  filter(n >= 3)

# A tibble: 12 x 6
   Date        RR.x  RR.y     Y   IDs     n
   <date>     <int> <dbl> <int> <dbl> <int>
 1 1984-11-09     1  22.9  1984     3     3
 2 1984-11-10     1  24.4  1984     3     3
 3 1984-11-11     1  19    1984     3     3
 4 1986-10-15     1  10.3  1986     5     3
 5 1986-10-16     1  18.1  1986     5     3
 6 1986-10-17     1  11.3  1986     5     3
 7 2003-10-25     1   7.6  2003     8     3
 8 2003-10-26     1   5    2003     8     3
 9 2003-10-27     1   6.6  2003     8     3
10 2011-11-21     1   9.8  2011    13     3
11 2011-11-22     1   5.6  2011    13     3
12 2011-11-23     1  20.4  2011    13     3
Cole
  • 11,130
  • 1
  • 9
  • 24