8

For datetimes fasttime provides very fast parsing to POSIXct

library('fasttime')
library('lubridate')
library('microbenchmark')

# parse character to POSIXct
Sys.setenv(TZ='UTC')
test <- rep('2011-04-02 11:01:00',1e4)
microbenchmark(
  test1 <- fastPOSIXct(test),
  test2 <- fast_strptime(test,format='%Y-%m-%d %H:%M:%S'),
  test3 <- as.POSIXct(test, format='%Y-%m-%d %H:%M:%S'),
  test4 <- ymd_hms(test),
  times=100)
Unit: microseconds
                                                       expr       min        lq      mean    median         uq       max
                                 test1 <- fastPOSIXct(test)   663.123   692.337  1409.448   701.821   712.4965 71231.585
 test2 <- fast_strptime(test, format = "%Y-%m-%d %H:%M:%S")  1026.342  1257.508  1263.157  1264.928  1273.8145  1366.438
    test3 <- as.POSIXct(test, format = "%Y-%m-%d %H:%M:%S")  9865.265 10060.450 10154.651 10145.551 10186.3030 13358.136
                                     test4 <- ymd_hms(test) 13990.206 17152.779 17278.654 17308.347 17393.6625 22193.544

Is there something equivalent for dates Date, the lubridate package provides some parser but the fast one (fast_strptime) cast dates to POSIXct (not meant for dates) Casting POSIXct to Date is too long.

Given how quick it is to parse to POSIXct I would think there should be something as quick to Date

Is there a fast packaged alternative ?

statquant
  • 13,672
  • 21
  • 91
  • 162

1 Answers1

7

Given

## the following two (here three) lines are all of fasttime's R/time.R
fastPOSIXct <- function(x, tz=NULL, required.components = 3L)
  .POSIXct(if (is.character(x)) .Call("parse_ts", x, required.components)
           else .Call("parse_ts", as.character(x), required.components), tz)

hence

## so we suggest to just use it, and convert later
fastDate <- function(x, tz=NULL)
  as.Date(fastPOSIXct(x, tz=tz))

which at least beats as.Date():

R> library(microbenchmark)
R> library(fasttime)
R> d <- rep("2010-11-12", n=1e4)
R> microbenchmark(fastDate(d), as.Date(d), times=100)
Unit: microseconds
        expr    min      lq    mean  median      uq     max neval cld
 fastDate(d) 47.469 48.8605 54.3232 55.7270 57.1675 104.447   100  a 
  as.Date(d) 77.194 79.4120 85.3020 85.2585 87.3135 121.979   100   b

R> 

If you wanted to go super fast, you could start with tparse.c to create the date-only subset you want.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Any way your RcppBDT can help avoiding me going the C way ? – statquant Feb 06 '16 at 22:37
  • 1
    There are parsers in Boost DateTime but I chose **not** to expose them as you would require linking against the library. Which we currently do not need for the pure _time calculations_ which are all header based. And header-only makes for _much_ easier builds and deployments. – Dirk Eddelbuettel Feb 06 '16 at 22:39
  • Just so you know you do not need to paste(x,'12:00:00'), it works without by default (see documentation) – statquant Feb 06 '16 at 23:07
  • Right. It stops parsing the step when it is over. That will make this solution harder to beat... Amending post and numbers. – Dirk Eddelbuettel Feb 06 '16 at 23:11
  • Just realized it parses and then call .POSIXct on the result, if as.Date does not do anything stupid it will indeed be hard to beat. Also, for more general formats like %Y%m%d then using the same function with fast_strptime from lubridate will work... – statquant Feb 06 '16 at 23:20