26

As the title goes. Why is the lubridate function so much slower?

library(lubridate)
library(microbenchmark)

Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 50000, replace = TRUE)

microbenchmark(as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT"), times = 100)
microbenchmark(dmy(Dates, tz ="GMT"), times = 100)

Unit: milliseconds
expr                                                            min         lq          median      uq          max
1 as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT")   103.1902    104.3247    108.675     109.2632    149.871
2 dmy(Dates, tz = "GMT")                                        184.4871    194.1504    197.8422    214.3771    268.4911
RJ-
  • 2,919
  • 3
  • 28
  • 35
  • 4
    I'll let someone who has more experience chime in, but my guess is that lubridate functions are designed to handle a lot of things "behind the scenes" which means it do more checking/vetting of input to try and give you reasonable results. Reading the [background docs](https://github.com/hadley/lubridate) echoes these sentiments. Whether or not that contribues to the slowness, I'm not sure...but that would be my guess. Similarly, the `plyr` family is written for convenience as well and may perform relatively poorly compared to base functions in certain circumstances...but it's easy to use! – Chase May 18 '12 at 02:53
  • @RJ- It would be much better if you had actual code in your question that shows the difference. `system.time` can be used to measure. – Tommy May 18 '12 at 06:30
  • Noted. will post it up shortly. – RJ- May 18 '12 at 06:43
  • you just need to use a specific function instead of the generic ones. – skan Jul 04 '18 at 17:13

2 Answers2

47

For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you're less likely to get blown up and it's easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we'd have something.

Start with looking at what dmy is doing and you'll see the difference for the speed (by the way from your bechmarks I wouldn't say that lubridate is that much slower as these are in milliseconds):

dmy #type this into the command line and you get:

>dmy
function (..., quiet = FALSE, tz = "UTC") 
{
    dates <- unlist(list(...))
    parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet, 
        tz = tz)
}
<environment: namespace:lubridate>

Right away I see parse_date and num_to_date and make_format. Makes one wonder what all these guys are. Let's see:

parse_date

> parse_date
function (x, formats, quiet = FALSE, seps = find_separator(x), 
    tz = "UTC") 
{
    fmt <- guess_format(head(x, 100), formats, seps, quiet)
    parsed <- as.POSIXct(strptime(x, fmt, tz = tz))
    if (length(x) > 2 & !quiet) 
        message("Using date format ", fmt, ".")
    failed <- sum(is.na(parsed)) - sum(is.na(x))
    if (failed > 0) {
        message(failed, " failed to parse.")
    }
    parsed
}
<environment: namespace:lubridate>

num_to_date

> getAnywhere(num_to_date)
A single object matching ‘num_to_date’ was found
It was found in the following places
  namespace:lubridate
with value

function (x) 
{
    if (is.numeric(x)) {
        x <- as.character(x)
        x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "")
    }
    x
}
<environment: namespace:lubridate>

make_format

> getAnywhere(make_format)
A single object matching ‘make_format’ was found
It was found in the following places
  namespace:lubridate
with value

function (order) 
{
    order <- strsplit(order, "")[[1]]
    formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y", 
        "%Y"))[order]
    grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ])))
}
<environment: namespace:lubridate>

Wow we got strsplit-ting, expand-ing.grid-s, paste-ing, ifelse-ing, unname-ing etc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.

Compare that to as.POSIXct:

getAnywhere(as.POSIXct)  #tells us to use methods to see the business
methods('as.POSIXct')    #tells us all the business
as.POSIXct.date          #what I believe your code is using (I don't use dates though)

There's a lot more Internal coding and less error checking going on with as.POSIXct So you have to ask do I want ease and safety or speed and power? Depends on the job.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 7
    +1 Great answer. Also, did you notice that `parse_date()` itself calls `as.POSIXct()`? So in the end, the `dmy()` car has an `as.POSIXct()` engine under the hood. – Josh O'Brien May 18 '12 at 14:00
  • 2
    I think it is actually using `as.POSIXct.default` to handle a character argument (`Dates` is a character vector). – Brian Diggs May 18 '12 at 18:36
  • Who ever downvoted this response it seems odd since 24 others found it helpful. Could you give some insight into your choice? – Tyler Rinker Nov 06 '13 at 02:20
12

@Tyler's answer is correct. Here's some more info including a tip on making lubridate faster - from the help file:

" Lubridate has an inbuilt very fast POSIX parser, ported from the fasttime package by Simon Urbanek. This functionality is as yet optional and could be activated with options(lubridate.fasttime = TRUE). Lubridate will automatically detect POSIX strings and use fast parser instead of the default strptime utility. "

c.gutierrez
  • 4,740
  • 1
  • 20
  • 14