Problem
Short version:
data.table
does not handle complex data types well.
Long version:
When assigning/subsetting the period column, data.table
seems to only touch .Data
containing the seconds. As far as I understand it, a period
object is something like a double
describing the amount of seconds with attributes year
, month
,etc.
data.table
only handles those actual values well (.Data
/seconds), but the other attributes apply to the entire column.
Some illustrating examples:
# Only .Data gets subsetted, no other slots
DT <- data.table(x = 1:3)
DT$p <- rep(period(7, "days"), 3)
str(DT[1,])
# Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables:
# $ x: int 1
# $ p:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..@ .Data : num 0
# .. ..@ year : num 0 0 0
# .. ..@ month : num 0 0 0
# .. ..@ day : num 7 7 7
# .. ..@ hour : num 0 0 0
# .. ..@ minute: num 0 0 0
# - attr(*, ".internal.selfref")=<externalptr>
# Only assigment to .Data, no other slots
DT[1, p := 40]
DT
# x p
# 1: 1 7d 0H 0M 40S
# 2: 2 7d 0H 0M 0S
# 3: 3 7d 0H 0M 0S
# DT even translates the multiple attributes into multiple values
# Notice how the seconds are correct.
DT <- data.table(x = 1:3)
DT$p <- c(period("1Y1S"), period("2Y2S"), period("3Y3S"))
DT[1,]$p
# [1] "1y 0m 0d 0H 0M 1S" "2y 0m 0d 0H 0M 1S" "3y 0m 0d 0H 0M 1S"
The internals of why data.table
behaves like this and if they will ever fully support period
and other complex data types, I do not know. I would suggest to keep an eye on https://github.com/Rdatatable/data.table/ if you want an answer to this.
Related issues:
data.table
can not handle multiple time zone attributes for a POSIXct
column.
Adding timezone to POSIXct object in data.table
https://github.com/Rdatatable/data.table/issues/4974
https://github.com/Rdatatable/data.table/issues/4415
Possible workarounds
- As a general solution, you can always use a column of type
list
for complex data types. It's a bit harder to reason with sometimes, but it always works. I would recommend this if you are not expecting to filter on that column.
DT <- data.table(x = 1:5)[x == 3, p := list(list(period(7, "day")))]
DT[x == 4, p := period(1, "month")]
DT[]
# x p
# 1: 1
# 2: 2
# 3: 3 7d 0H 0M 0S
# 4: 4 1m 0d 0H 0M 0S
# 5: 5
DT[p > period(1, "month"),]
# Error: 'list' object cannot be coerced to type 'double'
largerThanMonth <- function(x){
if(is.null(x)){
FALSE
} else{
x >= period(1, "month")
}
}
DT[sapply(p, largerThanMonth),]
# x p
# 1: 4 1m 0d 0H 0M 0S
- Native
data.frame
's seem to be able to work properly.
DT <- data.frame(x = 1:3)
DT$p <- c(period("1Y1S"), period("2Y2S"), period("3Y3S"))
DT[1,]$p
# [1] "1y 0m 0d 0H 0M 1S"
DT[1,]$p <- period("1M")
# x p
# 1 1 1M 0S
# 2 2 2y 0m 0d 0H 0M 2S
# 3 3 3y 0m 0d 0H 0M 3S
- Convert the
period
to numeric
or character
.