0

I'm trying to work out whether two different date ranges overlap and return a TRUE/FALSE where they do, which will allow me to successfully filter the data I'm interested in. I was trying to use lubridate intervals and the %within% which basically checks whether a date falls within a period. In the example below, I was checking whether the started_fishing date and the finished_fishing date fell within the period defined by the start_of_fishing_season and end_of_fishing_season.

This approach doesn't work. As, although in the example below the dates don't fall within the period, if we assume we are fishing constantly from the started_fishing date to the finished_fishing date then some of the dates would fall within the fishing season.

I've inserted the true/false column to indicate what a correct output should look like.

I'm guessing perhaps what I need to do is to create a sequence from started_fishing and finished_fishing and then check each date against fishing season dates? Hope this is clear enough, any help appreciated.

Ideally fixing this in lubridate (which isn't shown below) would be great.


deployment = data.frame(start_of_fishing_season = c(as.POSIXct("2019-10-16", tz = "UTC"),
                                                    as.POSIXct("2019-10-16", tz = "UTC")),
                        end_of_fishing_season = c(as.POSIXct("2020-05-01", tz = "UTC"),
                                                  as.POSIXct("2020-05-01", tz = "UTC")),
                        started_fishing = c(as.POSIXct("2019-06-12 15:45:00", tz = "UTC"),
                                            as.POSIXct("2019-06-12 15:45:00", tz = "UTC")),
                        finished_fishing = c(as.POSIXct("2021-03-24 15:45:00", tz = "UTC"),
                                             as.POSIXct("2019-08-13 15:45:00", tz = "UTC")),
                        fishing_overlaps = c(TRUE,FALSE))


deployment
#>   start_of_fishing_season end_of_fishing_season     started_fishing
#> 1              2019-10-16            2020-05-01 2019-06-12 15:45:00
#> 2              2019-10-16            2020-05-01 2019-06-12 15:45:00
#>      finished_fishing fishing_overlaps
#> 1 2021-03-24 15:45:00             TRUE
#> 2 2019-08-13 15:45:00            FALSE

Dasr
  • 777
  • 6
  • 16
  • I think there's an additional function in lubridate called int_overlap which does what I needed. But interested to see other approaches :) – Dasr Aug 02 '22 at 10:00

1 Answers1

0

So, one of the ways I have found to address this problem is:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# creating data
deployment <- data.frame(
  start_of_fishing_season = c(
    as.POSIXct("2019-10-16", tz = "UTC"),
    as.POSIXct("2019-10-16", tz = "UTC")
  ),
  end_of_fishing_season = c(
    as.POSIXct("2020-05-01", tz = "UTC"),
    as.POSIXct("2020-05-01", tz = "UTC")
  ),
  started_fishing = c(
    as.POSIXct("2019-06-12 15:45:00", tz = "UTC"),
    as.POSIXct("2019-06-12 15:45:00", tz = "UTC")
  ),
  finished_fishing = c(
    as.POSIXct("2021-03-24 15:45:00", tz = "UTC"),
    as.POSIXct("2019-08-13 15:45:00", tz = "UTC")
  )
)



# checking whether date intervals overlap
deployment <- deployment %>%
  mutate(fish_seas_int = interval(start_of_fishing_season, end_of_fishing_season)) %>%
  mutate(fishing_int = interval(started_fishing, finished_fishing)) %>%
  mutate(was_active = int_overlaps(fish_seas_int, fishing_int))


# new data
deployment
#>   start_of_fishing_season end_of_fishing_season     started_fishing
#> 1              2019-10-16            2020-05-01 2019-06-12 15:45:00
#> 2              2019-10-16            2020-05-01 2019-06-12 15:45:00
#>      finished_fishing                  fish_seas_int
#> 1 2021-03-24 15:45:00 2019-10-16 UTC--2020-05-01 UTC
#> 2 2019-08-13 15:45:00 2019-10-16 UTC--2020-05-01 UTC
#>                                        fishing_int was_active
#> 1 2019-06-12 15:45:00 UTC--2021-03-24 15:45:00 UTC       TRUE
#> 2 2019-06-12 15:45:00 UTC--2019-08-13 15:45:00 UTC      FALSE


Created on 2022-08-03 by the reprex package (v2.0.1)



I would be interested in alternatives...

Dasr
  • 777
  • 6
  • 16