I have a large data table, (1 billion rows × 50 columns) similar to flights
from library(nycflights13)
, where multiple columns can be combined to form a date.
The code I am currently using to create this date-time column is:
library(data.table)
library(nycflights13)
library(fasttime)
flights <- as.data.table(flights)
flights[,DepDateTime := fastPOSIXct(paste0(year,
"-",
formatC(month, width = 2, format = "d", flag = "0"),
"-",
formatC(day, width = 2, format = "d", flag = "0"),
" ",
# replace e.g. 903 with '09:03:00'
gsub("([0-9]{2})([0-9]{2})", "\\1:\\2:00",
formatC(dep_time, width = 4,
format = "d", flag = "0")))
)]
For the flights
data, this takes around 0.6 s. Is there any way to improve this performance? I am interested in timing primarily; memory usage is a secondary concern.
Here is a candidate data table:
flights.big <-
data.table(year = sample(1980:2015, size = 1e9, replace = TRUE),
month = sample(1:12, size = 1e9, replace = TRUE),
day = sample(1:28, size = 1e9, replace = TRUE),
hour = sample(1:12, size = 1e9, replace = TRUE),
minute = sample(0:59, size = 1e9, replace = TRUE)
)