0

I have a data.table with characters in two columns like so:

01/01/2014 | 00:30
02/01/2014 | 01:00
03/01/2014 | 01:30 etc

The length of this data set varies but is easily over 300,000 rows each time the script is run. Eventually I know this script will need to deal with a data set of 30,000,000 rows plus.

I currently paste them in the following form:

DT[, DateTime := paste(Date, Time)

Which leads to:

01/01/2014 00:30
02/01/2014 01:00
03/01/2014 01:30 etc

I then use as.POSIXct to convert that into a POSIX date:

DT[, DateTime:= as.POSIXct(x = DateTime, format = "%d/%m/%Y %H:%M")]

This works fine, converting the characters correctly, largely I believe because I set the format argument to match the structure of the character string it is fed.

However, I'd like to use the fasttime package, but there is an inherent problem in that it does not support a format argument to input. Therefore, when I run:

DT[, DateTime := fastPOSIXct(x = DateTime)]

fasttime has to interpret my data as the "order of interpretation is fixed: year, month, day, hour, minute, second." the output would come out like:

2006/07/07 00:30
2007/07/07 01:00
2008/07/07 01:30 etc

Therfore, it seems I either must use as.POSIXct, or find a way to manipulate the string into the right order.

What would be the most efficient way to allow me to use fasttime? How should I reorder the character string to match? Would you expect that it would be worth reordering the character strings in order to use fasttime, or would the added requirement to correct the strings make fasttime savings negligible?

DaveRGP
  • 1,430
  • 15
  • 34

1 Answers1

2

Use sub to reorder your string first, and yes, I think that's going to be much faster than using base as.POSIXct:

DT[, DateTime := fastPOSIXct(sub('(\\d*)/(\\d*)/(\\d*) (.*)', '\\3-\\1-\\2 \\4', DateTime))]

You might also be able to speed this up more using substr instead of regular expressions, but it'll be much messier.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • Thanks for the pointer, I've heard of these 'regex' things, but never used them. I may well have totally misunderstood how this works, but is the `x` at the end suppose to be what I'm operating on? i.e. DateTime – DaveRGP Apr 24 '15 at 08:31
  • @DaveRGP yes, sorry, didn't read your variable names too carefully, fixed now – eddi Apr 24 '15 at 08:34
  • No Prob, I was wondering if you could expand your answer a little to explain exactly what that sub string is doing? I have no prior experience with regex, and am having trouble matching my reading to exactly what that part of the code is doing. – DaveRGP Apr 27 '15 at 08:32
  • Also, I've noticed some (too me at least), weird behavior in fasttime and conversion. Have you got an answer to this? [link] (http://stackoverflow.com/questions/29892116/does-fasttime-cache-values-outside-of-its-environment-if-so-why?lq=1) – DaveRGP Apr 27 '15 at 09:41
  • Don't worry about the above link. It was a silly mistake on my part. – DaveRGP Apr 27 '15 at 09:59
  • 1
    @DaveRGP you can find online tools that visualize the regular expressions - [here's one](http://regexper.com/#(%5Cd*)%5C%2F(%5Cd*)%5C%2F(%5Cd*)%20(.*)) (note I had to modify the special symbols a little to fit javascript's slightly different syntax) – eddi Apr 27 '15 at 11:44