0

Is it any faster way to get the year from large data set (around 1GB) in R?

Currently I used data$year <- format(as.Date(data$pickup_datatime), "%Y") to get the year, but it took very long time.

Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Do both functions take a long time, or is it one or the other? You might try `lubridate::year()` instead of `format`. – Mako212 Nov 15 '18 at 20:18
  • 3
    Instead of parsing the dates, you may try `substr` or `stringi::stri_sub`, as I did here when grabbing hour: [Fastest way to extract hour from time (HH:MM)](https://stackoverflow.com/questions/22803212/fastest-way-to-extract-hour-from-time-hhmm/22806994#22806994). When posting a question about speed, it's good if you also provide easily reproducible data of sufficient size to try the code on. Cheers. – Henrik Nov 15 '18 at 20:47
  • 1
    Possible duplicate of [Fastest way to extract hour from time (HH:MM)](https://stackoverflow.com/questions/22803212/fastest-way-to-extract-hour-from-time-hhmm) – Nairolf Nov 15 '18 at 21:10
  • ...or at least post some sample data. – Gregor Thomas Nov 15 '18 at 21:10

1 Answers1

0

the lubridate package has a built-in function to get the year from a date-like object. Here's the use for your case:

data$year <- lubridate::year(data$pickup_datatime)
dmca
  • 675
  • 1
  • 8
  • 18