-1

I have a data set that has an arbitrary 5 minute interval period represented in the Period column. Periods for different observations (IDs) start at different times (for ID='83' it started at 49M while for ID='90' it started at 50M). The observations are time intervals which are stored in Start.N and End.N columns.

These last 2 columns have one interval that represent the continuous time of observation or 'events' (regardless of period), hence it is repeated. Some observations have more than one 'event' and the Period are repeated to fit those extra intervals

My goal is to calculate the amount of minutes and seconds of overlap between the event(s) and the arbitrary bins. To clarify, The first row should have overlap 0M 0S while the second row should have 5M 0S overlap because 54M 0S - 59M 0S is contained within 54M 1S - 89M 0S.

  ID Period Period.start Period.end Start.N  End.N
1 83      5       49M 0S     54M 0S  54M 1S 89M 0S
2 83     10       54M 0S     59M 0S  54M 1S 89M 0S
3 83     15       59M 0S     64M 0S  54M 1S 89M 0S
4 83     20       64M 0S     69M 0S  54M 1S 89M 0S
5 83     25       69M 0S     74M 0S  54M 1S 89M 0S
6 83     30       74M 0S     79M 0S  54M 1S 89M 0S

Here's my data

structure(list(ID = c("83", "83", "83", "83", "83", "83", "83", 
"83", "83", "83", "90", "90", "90", "90", "90", "90", "90", "90", 
"90", "90", "90", "90", "90", "90", "90", "90", "90", "90", "90", 
"90"), Period = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 5, 5, 
10, 10, 15, 15, 20, 20, 25, 25, 30, 30, 35, 35, 40, 40, 45, 45, 
50, 50), Period.start = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
), year = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0), day = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), hour = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), minute = c(49, 54, 59, 64, 69, 74, 79, 
84, 89, 94, 50, 50, 55, 55, 60, 60, 65, 65, 70, 70, 75, 75, 80, 
80, 85, 85, 90, 90, 95, 95), class = structure("Period", package = "lubridate")), 
    Period.end = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), year = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), day = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), hour = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), minute = c(54, 59, 64, 69, 
    74, 79, 84, 89, 94, 99, 55, 55, 60, 60, 65, 65, 70, 70, 75, 
    75, 80, 80, 85, 85, 90, 90, 95, 95, 100, 100), class = structure("Period", package = "lubridate")), 
    Start.N = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 32, 32, 
    32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 
    32, 32, 32), year = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), month = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), day = c(0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), minute = c(54, 
    54, 54, 54, 54, 54, 54, 54, 54, 54, 52, 94, 52, 94, 52, 94, 
    52, 94, 52, 94, 52, 94, 52, 94, 52, 94, 52, 94, 52, 94), class = structure("Period", package = "lubridate")), 
    End.N = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), year = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), month = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), day = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), minute = c(89, 
    89, 89, 89, 89, 89, 89, 89, 89, 89, 83, 111, 83, 111, 83, 
    111, 83, 111, 83, 111, 83, 111, 83, 111, 83, 111, 83, 111, 
    83, 111), class = structure("Period", package = "lubridate"))), .Names = c("ID", 
"Period", "Period.start", "Period.end", "Start.N", "End.N"), row.names = c(NA, 
30L), class = "data.frame")
eddi
  • 49,088
  • 6
  • 104
  • 155
Matias Andina
  • 4,029
  • 4
  • 26
  • 58
  • Possibly related: https://stackoverflow.com/questions/25033585/calculating-time-difference-in-r-using-lubridate – Damian Jul 18 '17 at 22:12
  • What's your difficulty? Seems straightforward to compare the starts/ends, and compute corresponding end-start if there is an overlap. – eddi Jul 19 '17 at 16:02
  • @eddi I don't get how to compute the overlap for every Start.N - End.N interval inside the specific bin. I have an idea around something like `ifelse(Period.start>=Start.N & Period.end < End.N , 'contained', 'not contained')` but then computing the actual overlap in seconds I don't have a clue. Also, I would need to implement this for every unique Start.N and End.N event, integrate that into mutate in some dplyr chain – Matias Andina Jul 19 '17 at 17:03
  • @MatiasAndina smth like this: `ifelse(Period.start > End.N | Period.end < Start.N, 0, pmin(End.N - Period.start, Period.end - Start.N))`; if your problem is figuring out how to subtract objects of whatever class from each other and present in whatever format - look around or post a question about that – eddi Jul 19 '17 at 17:19
  • @eddi I like your ifelse statement, the operation it's not doing what I need but it's quite close. Since it's conveniently doing it row-wise, when events are greater than 5 minutes, the subtractions for the following period go something like 5M 30S; 10M 30S; 15M 30S, ... I guess my question was indeed about the operation but I didn't see this as straightforward nor understand the downvotes. You might post your code as an answer and I'll add up to it. – Matias Andina Jul 19 '17 at 19:38
  • @MatiasAndina I can't comment on the downvote. Feel free to post an answer yourself - it's very much encouraged on SO. – eddi Jul 19 '17 at 20:32

1 Answers1

1

Here's what I used to solve the problem. The ifelse was provided by eddi, I did the dplyr data management later, I'm happy to have input from more efficient programmers.

    new.data %>% mutate(NPeriodStart = End.N - Period.start,
        PeriodEndStartN = Period.end - Start.N,
         N.Time.Prep=ifelse(Period.start > End.N | Period.end < Start.N,
                             0,
lubridate::seconds(pmin(End.N - Period.start, Period.end - Start.N)))) %>%

  ###  We fix the over 300 seconds so that it's max 300 (5 min bin)

  mutate(N.Time = ifelse(N.Time.Prep>300, 300, N.Time.Prep)) %>%
  select(-NPeriodStart, - PeriodEndStartN, - N.Time.Prep)


  ## Finally, we get a data frame with just what we want

  # ID, Period, N time  
  # To get rid of the repeated measures (more than one N bout)
  # we summarize and select the max value  

  new.data <- new.data %>%
                group_by(ID, Period) %>% 
                summarise(N.Time = max(N.Time))  
Matias Andina
  • 4,029
  • 4
  • 26
  • 58