Convert Date to year month representation

Question

I have a Date, and am interested in representing it as an integer of yyyymm form. Currently, I do:

get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from = as.Date("2012-01-01"), to = as.Date("5012-01-01"), by = 1) 
system.time(ym <- get_year_month(mydate))
#    user  system elapsed 
#    5.972   0.974   6.951

This is very slow for large datasets. Is there a faster way? Please provide timings for your answers so they can be easily compared. Use the above example.

score 5 · Accepted Answer · answered Mar 09 '13 at 22:42

Using functions from the lubridate package can be almost twice as fast as your function :

mydate = as.Date(rep("2012-01-01",1000))
library(lubridate)
library(microbenchmark)
microbenchmark(get_year_month(mydate),
               year(mydate)*100+month(mydate))

gives :

R> Unit: milliseconds
                               expr      min       lq   median       uq
             get_year_month(mydate) 2.150296 2.188370 2.218176 2.285973
 year(mydate) * 100 + month(mydate) 1.220016 1.228129 1.239704 1.284568

great! looks like `lubridate` `month` and `year` functions are much faster than `base`. using `base` functions increases the time substantially. — Alex, Mar 10 '13 at 00:16

CHP · Answer 2 · 2013-03-10T11:58:00.263

You can try using yearmon class from zoo package. In general if you are doing timeseries manipulation and analysis, I would suggest using xts or atleast zoo class. xts has lot of functionality for analysis of very huge timeseries data.

Here is quick benchmark against other suggested solutions.

get_year_month <- function(d) {
    return(as.integer(format(d, "%Y%m")))
}
mydate = as.Date(rep("2012-01-01", 1e+06))

microbenchmark(get_year_month(mydate), year(mydate) * 100 + month(mydate), as.yearmon(mydate, format = "%Y-%m-%d"), times = 1)
## Unit: milliseconds
##                                     expr       min        lq    median        uq       max neval
##                   get_year_month(mydate) 1049.8813 1049.8813 1049.8813 1049.8813 1049.8813     1
##       year(mydate) * 100 + month(mydate)  434.1765  434.1765  434.1765  434.1765  434.1765     1
##  as.yearmon(mydate, format = "%Y-%m-%d")  249.6704  249.6704  249.6704  249.6704  249.6704     1

Theodore Lytras · Answer 3 · 2013-03-09T23:52:42.317

2

It would be best to keep your Dates in POSIXlt format if you want to manipulate them like that:

> system.time(ym <- get_year_month(mydate))
   user  system elapsed 
  4.039   0.025   4.079 
> system.time(mydatep <- as.POSIXlt(mydate))
   user  system elapsed 
  3.576   0.016   3.603 
> system.time(ym <- (1900 + mydatep$year)*100 + (mydatep$mon + 1))
   user  system elapsed 
  0.010   0.005   0.015

It's still a little faster, and you get subsequent similar operations for free, in terms of time.

edited Mar 09 '13 at 23:52

answered Mar 09 '13 at 23:03

Theodore Lytras

3,955
1
18
25

a bit unfamiliar with `POSIXlt` but it doesn't look like it provides the same answer... – Alex Mar 09 '13 at 23:18
1

Whoops, my bad. Corrected my answer. `$year` gives the number of years after 1900, and `$mon` the number of months after January. For details `?POSIXlt`. – Theodore Lytras Mar 09 '13 at 23:56

score 0 · Answer 4 · answered Mar 09 '13 at 22:48

0

There may not be a faster way for a single item. However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate e.g.

function mydate(D) {
  x <- replicate(dim(D)[0], get_year_month(..)
  return(x)
}

answered Mar 09 '13 at 22:48

WestCoastProjects

58,982
91
316
560

thanks for your answer. i'm not sure what it means unfortunately. could you please provide an example as the other two. – Alex Mar 10 '13 at 00:16
Hi Alex, Please look up the use of the builtin "replicate", which will avoid the penalty of looping N times (N being the number of entries in your array) . – WestCoastProjects Mar 10 '13 at 01:30
`replicate` is just `lapply`.. still no idea what you mean. post an example as the others have with timings. this might clear up some confusion. – Alex Mar 10 '13 at 01:33

Convert Date to year month representation

4 Answers4

Linked

Related