3

When durations are computed in data.table (v1.9.2), the wrong units can be printed with POSIXct arithmetic. It seems the first units are chosen.

require("data.table")
dt <- data.table(id=c(1,1,2,2), 
                  event=rep(c("start", "end"), times=2), 
                  time=c(as.POSIXct(c("2014-01-31 06:05:30", 
                                      "2014-01-31 06:45:30", 
                                      "2014-01-31 08:10:00", 
                                      "2014-01-31 09:30:00"))))
dt$time[2] - dt$time[1]  # in minutes
dt$time[4] - dt$time[3]  # in hours
dt[ , max(time) - min(time), by=id]  # wrong units printed for id 2

I realize that one of these is the correct way to do it to get expected behavior, but wanted to report this behavior. Not sure if it is really a data.table problem or POSIXct problem.

dt[ , difftime(max(time), min(time), units="mins"), by=id]  # both in mins
dt[ , difftime(max(time), min(time), units="hours"), by=id]  # both in hours
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
alexperrone
  • 667
  • 5
  • 12
  • Agreed it's a `data.table` problem with `POSIXct`. Thanks for filing the good report: [issue 761](https://github.com/Rdatatable/data.table/issues/761). – Matt Dowle Aug 11 '14 at 20:08

3 Answers3

3

You'll get the expected result if you do

dt[ , list(c(max(time) - min(time)),attr(max(time) - min(time),"units")), by=id]

Putting the c() around the time operation eliminates the attribute so you just get a number and then explicitly asking for the "units" attribute as another list element by itself gets the proper unit in its own column. The reason it doesn't work without doing it this way is that data.table doesn't parse out attributes to be other columns and that is how POSIXct returns the units.


From Matt:

+1 Just to add a small speed improvement to save the max(time)-min(time) twice :

dt[ , list(c(d<-max(time) - min(time)), attr(d,"units")), by=id]
   id        V1    V2
1:  1 40.000000  mins
2:  2  1.333333 hours

At least to start with, I guess we'll add a check for inconsistent attributes across group results and then issue a warning/error. So this solution (or the one in the question) will likely be needed anyway.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
2

This could be viewed as operator error, because your table is (automatically) displaying a numeric equivalent of a difftime, but you are not specifying which units to display the answer in. In most cases where you wish to export/display difftime values the desired units should be specified in an explicit conversion to numeric. E.g.

dt[ , as.numeric( max(time) - min(time), units="hours" ), by=id]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

Forcing units is the way to go until #761 is fixed. Here's another option:

dt[ , difftime(max(time), min(time), units = 'mins'), by = id]
#    id      V1
# 1:  1 40 mins
# 2:  2 80 mins

This allows you to retain the class of the output (difftime) if you'd like.

More than anything I find the behavior of R to fundamentally change the contents of the difftime object based on the units attribute quite strange. In other places in R, this conversion would simply be handled by the print method while the stored value of the object remains consistent.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198