-1

Update: This seems to be very well-described in SQL forums -- how to account for the gaps in-between time ranges (many of which overlap.) So I may have to turn to SQL to quickly solve this problem, but I'm surprised it cannot be done in "R". It would appear that the object used by interval gets almost all the way there, but outside of a slow loop, it seems difficult to apply on a vector-wide analysis. Please do let me know if you have any ideas, but here's a description of the problem and its solution in SQL:

https://www.simple-talk.com/sql/t-sql-programming/calculating-gaps-between-overlapping-time-intervals-in-sql/

.... What I'd like to do is come up with a list of non-activity time from a log, and then filter down on it to show a minimum amount of time of non-activity.

 1/17/2012 0:15 1/17/2012 0:31
 1/20/2012 0:21 1/20/2012 0:22
 1/15/2013 1:08 1/15/2013 1:10
 1/15/2013 1:08 1/15/2013 1:10
 1/15/2013 7:39 1/15/2013 7:41
 1/15/2013 7:39 1/15/2013 7:41
 1/16/2013 1:11 1/16/2013 1:15
 1/16/2013 1:11 1/16/2013 1:15

I was going to just lag the end times into the start row and compute the difference, but then it became clear there were overlapping activities. I also tried "price is right" type matching to get the closest end time... except, of course, if things are going on simultaneously, this doesn't guarantee there's no activity from a still-unfinished simultaneous task.

I currently have date-time in and date-time out columns. I am hoping there is a better idea than taking the many millions of entries, and using seq.POSIXt to write every individual minute that has activity? But even that doesn't seem very workable.. But it would seem there would be some easy way to identify gaps of time of a minimum size, whether it be 5 minutes or 30. Any suggestions?

lorange
  • 1
  • 3
  • 1
    I don't understand what's the question (and expected output). Can you rephrase? – talat Feb 15 '16 at 20:36
  • Kenney, Yes-- but unfortunately I cannot figure out how to do this in R. Perhaps it would require a loop...? – lorange Feb 15 '16 at 20:42
  • docendo, what I'm trying to do is come up with a way to identify and quantify of non-activity (of a minimum length) represented in a user activity log. – lorange Feb 15 '16 at 20:44
  • Thank you Kenney-- I know how I'd do this in other languages, but the mantra in "R" always seems to be "never loop", plus it tends to cause performance issues.. But it may end up being a loop after all.. – lorange Feb 15 '16 at 20:55

1 Answers1

0

Assuming that 1/17/2012 00:15 is the first value in your data set, I would convert your data into two columns, each column will contain the number of minutes since this time stamp

ie using the first 3 rows of your data as an example
    _______|_______
    0      | 16
    4323   | 4324
    528882 | 528884
    ...    | ...

Subtracting these two columns from each other will tell you the minutes where activity occurred, you can then simply inverse this and you will get your non activity.

aeongrail
  • 1,304
  • 1
  • 13
  • 26
  • Because they overlap, it does not work, as far as I can tell. I was thinking of doing every minute within the several years span and testing them against the multiple overlapping intervals... – lorange Feb 15 '16 at 22:37
  • Without some way of viewing your code, it become a bit hard to help sorry – aeongrail Feb 15 '16 at 22:42
  • I't quite okay-- I truly appreciate the help. That was my original solution before i noticed there were overlapping data points-- so I definitely think it's a good one! – lorange Feb 16 '16 at 01:05
  • Depending on how you are doing the overlap detection you should be able to catch it. Ie if you took all the unique active times using `unique()` and then inversed them using `ActiveTimes <- TimeSpan[ - which(TimeSpan == ActiveTimes)]` You should be able to ignore the doubleups – aeongrail Feb 16 '16 at 01:46
  • This is a very good idea, but the problem I've had in executing it is that the start points and end points are only are only rarely doubled, but it's rather an overlap. So let's say someone does something loggable from 10:30am-11:45am, but at 10:40am-11am does something else. If am looking for the log-out closest to 10:30am, it would be 11am, so that would under-count that log by 45 minutes. Or if i just literally counted the seconds between two, it would overstate the 'busy' time by 20 minutes, as it wouldn't recognize the second pair as overlapping with the first pair. -j – lorange Feb 16 '16 at 18:30