10

What's the best way to get the length of time represented by an interval in lubridate, in specified units? All I can figure out is something like the following messy thing:

> ival
[1] 2011-01-01 03:00:46 -- 2011-10-21 18:33:44

> difftime(attr(ival, "start") + as.numeric(ival), attr(ival, "start"), 'days')
Time difference of 293.6479 days

(I also added this as a feature request at https://github.com/hadley/lubridate/issues/105, under the assumption that there's no better way available - but maybe someone here knows of one.)

Update - apparently the difftime function doesn't handle this either. Here's an example.

> (d1 <- as.POSIXct("2011-03-12 12:00:00", 'America/Chicago'))
[1] "2011-03-12 12:00:00 CST"
> (d2 <- d1 + days(1))  # Gives desired result
[1] "2011-03-13 12:00:00 CDT"
> (i2 <- d2 - d1)
[1] 2011-03-12 12:00:00 -- 2011-03-13 12:00:00 
> difftime(attr(i2, "start") + as.numeric(i2), attr(i2, "start"), 'days')
Time difference of 23 hours

As I mention below, I think one nice way to handle this would be to implement a /.interval function that doesn't first cast its input to a period.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • You say you want an interval in a particular unit but from context of the comments below, it sounds as though you want it rounded to the nearest whole unit. If so can you edit further so it is clear? – IRTFM Jan 09 '12 at 17:30
  • No, I don't want it rounded. Maybe the edits I just made help clarify? – Ken Williams Jan 09 '12 at 18:59

4 Answers4

13

The as.duration function is what lubridate provides. The interval class is represented internally as the number of seconds from the start, so if you wanted the number of hours you could simply divide as.numeric(ival) by 3600, or by (3600*24) for days.

If you want worked examples of functions applied to your object, you should provide the output of dput(ival). I did my testing on the objects created on the help(duration) page which is where ?interval sent me.

 date <- as.POSIXct("2009-03-08 01:59:59") # DST boundary
 date2 <- as.POSIXct("2000-02-29 12:00:00")
 span <- date2 - date  #creates interval 
 span
#[1] 2000-02-29 12:00:00 -- 2009-03-08 01:59:59 
 str(span)
#Classes 'interval', 'numeric'  atomic [1:1] 2.85e+08
#  ..- attr(*, "start")= POSIXct[1:1], format: "2000-02-29 12:00:00"
 as.duration(span)
#[1] 284651999s (9.02y) 
 as.numeric(span)/(3600*24)
#[1] 3294.583
# A check against the messy method:
difftime(attr(span, "start") + as.numeric(span), attr(span, "start"), 'days')
# Time difference of 3294.583 days
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks, but that's not correct - 3600*24 is not always the number of seconds in a day. What I'm trying to do is tap into the systems that already handle things like DST, etc. – Ken Williams Jan 07 '12 at 04:57
  • @KenWilliams I don't understand why you think this is wrong. `?duration` seems to imply that it does in fact respect DST, just like `difftime`. When I convert your example using `as.duration` I get the exact same answer you got using `difftime`. – joran Jan 07 '12 at 07:24
  • You can also divide durations by days(1) – hadley Jan 07 '12 at 20:06
  • Well shoot, even difftime() doesn't do the calculation I'm looking for then. =( I'm looking for a calculation where the difference *in days* between `2011-03-12 12:00:00` and `2011-03-13 12:00:00` in the `America/Chicago` timezone is 1 day, even though that day only contained 23 hours. I'll update my question. – Ken Williams Jan 09 '12 at 16:04
  • @hadley - What I'm looking for would be possible by dividing an *interval* by `days(1)`, but currently in `lubridate` that's done by first converting it to a `duration`, and then (quite correctly) giving a warning. – Ken Williams Jan 09 '12 at 16:06
  • Note further that adding `days(1)` to `2011-03-12 12:00:00 CST` does in fact give `2011-03-13 12:00:00 CDT` even though that's not a span of 24 hours, but the same logic doesn't seem to be available for subtraction/`interval`s. – Ken Williams Jan 09 '12 at 16:14
11

This question is really old, but I'm adding an update because this question has been viewed many times and when I needed to do something like this today, I found this page. In lubridate you can now do the following:

d1 <- ymd_hms("2011-03-12 12:00:00", tz = 'America/Chicago')
d2 <- ymd_hms("2011-03-13 12:00:00", tz = 'America/Chicago')

(d1 %--% d2)/dminutes(1)
(d1 %--% d2)/dhours(1)
(d1 %--% d2)/ddays(1)
(d1 %--% d2)/dweeks(1)
Michael Dewar
  • 2,553
  • 1
  • 6
  • 22
3

Ken, Dividing by days(1) will give you what you want. Lubridate doesn't coerce periods to durations when you divide intervals by periods. (Although the algorithm for finding the exact number of whole periods in the interval does begin with an estimate that uses the interval divided by the analagous number of durations, which might be what you are noticing).

The end result is the number of whole periods that fit in the interval. The warning message alerts the user that it is an estimate because there will be some fraction of a period that is dropped from the answer. Its not sensible to do math with a fraction of a period since we can't modify a clock time with it unless we convert it to multiples of a shorter period - but there won't be a consistent way to make the conversion. For example, the day you mention would be equal to 23 hours, but other days would be equal to 24 hours. You are thinking the right way - periods are an attempt to respect the variations caused by DST, leap years, etc. but they only do this as whole units.

I can't reproduce the error in subtraction that you mention above. It seems to work for me.

    three <- force_tz(ymd_hms("2011-03-12 12:00:00"), "") 
    # note: here in TX, "" *is* CST
    (four <- three + days(1))
    > [1] "2011-03-13 12:00:00 CDT"
    four - days(1)
    > [1] "2011-03-12 12:00:00 CST"
Garrett
  • 61
  • 2
  • However - for an interval, there *is* a way to make the conversion, because it's rooted in an exact instant. You don't know if some arbitrary day has 24 hours, but you do know whether *this specific* day has 24 hours, so the calculations should be possible. – Ken Williams Jan 09 '12 at 20:25
  • @KenWilliams I take your point. Lubridate currently doesn't do that calculation, but maybe it should. My thought has been that the remainder might be 1/2 of the first day in the interval, but 12/23 of the last day in the interval. Perhaps the last day is all that matters. – Garrett Feb 01 '12 at 20:09
1

Be careful when divinding time in seconds to obtain days as then you are no longer working with abstract representations of time but in bare numbers, which can lead to the following:

> date_f <- now()
> date_i <- now() - days(23)
> as.duration(date_f - date_i)/ddays(1)
[1] 22.95833
> interval(date_i,date_f)/ddays(1)
[1] 22.95833
> int_length(interval(date_i,date_f))/as.numeric(ddays(1))
[1] 22.95833

Which leads to consider that days or months are events in a calendar, not time amounts that can be measured in seconds, miliseconds, etc.

The best way to calculate differences in days is avoiding the transformation into seconds and work with days as a unit:

> e <- now()
> s <- now() - days(23)  
> as.numeric(as.Date(s))
[1] 18709
> as.numeric(as.Date(e) - as.Date(s))
[1] 23

However, if you are considering a day as a pure 86400 seconds time span, as ddays() does, the previous approach can lead to the following:

> e <- ymd_hms("2021-03-13 00:00:10", tz = 'UTC')
> s <- ymd_hms("2021-03-12 23:59:50", tz = 'UTC')
> as.duration(e - s)
[1] "20s"
> as.duration(e - s)/ddays(1)
[1] 0.0002314815
> as.numeric(as.Date(e) - as.Date(s))
[1] 1

Hence, it depends on what you are looking for: time difference or calendar difference.

xaviescacs
  • 309
  • 1
  • 5