3

I am trying to use openxlsx to read large excel files with time series data. I found that convertToDateTime starts omitting the time if the numeric vector argument exceeds a certain amount of elements. My code looks like this:

ts <- readWorkbook(my.wb, sheet = as.character(r[["dSheet"]]), 
                startRow = 2, cols = 1, 
                colNames=FALSE, detectDates=FALSE)
colnames(ts) <- c("dt")
> head(convertToDateTime(ts$dt[1:30830]))
[1] "2016-11-23 15:20:00 MST" "2016-11-23 15:24:59 MST"
[3] "2016-11-23 15:30:00 MST" "2016-11-23 15:34:59 MST"
[5] "2016-11-23 15:40:00 MST" "2016-11-23 15:45:00 MST"

> head(convertToDateTime(ts$dt[1:30840]))
[1] "2016-11-23 MST" "2016-11-23 MST" "2016-11-23 MST" "2016-11-23 MST"
[5] "2016-11-23 MST" "2016-11-23 MST"

Side question: If you look at element #2 (2016-11-23 15:24:59) in my original code, this should actually be 15:25. If there's a simple way to fix this please let me know...

Here's a working example:

>library("openxlsx")
>dates <- runif(31000, 41000, 42000)
>head(convertToDateTime(dates[1:5000]))

[1] "2013-05-29 09:34:28 MDT" "2014-07-01 03:52:13 MDT"
[3] "2012-06-02 09:27:47 MDT" "2012-05-06 13:42:04 MDT"
[5] "2014-09-26 04:50:36 MDT" "2013-10-26 03:14:00 MDT"

> head(convertToDateTime(dates[1:10000]))
[1] "2013-05-29 MDT" "2014-07-01 MDT" "2012-06-02 MDT" "2012-05-06 MDT"
[5] "2014-09-26 MDT" "2013-10-26 MDT"

Is there a fix for this behavior or would you recommend to try a completely different approach (implement conversion)?

Thank you in advance for your help!

FlorianK
  • 71
  • 6

2 Answers2

4

I found a workaround to my issue in Converting numeric time to datetime POSIXct format in R. Simply multiplying Excels hours by 3600 *24 to get the time span in seconds and using as.POSIXCT works fine:

> dates <- runif(30000, 41000, 42000)
> ts1 <- convertToDateTime(dates)
> ts2 <- as.POSIXct(dates*3600*24, tz="GMT", origin = "1900-01-01")
> head(ts1)
[1] "2013-10-04 MDT" "2012-04-04 MDT" "2014-06-12 MDT" "2013-01-24 MST"
[5] "2012-09-12 MDT" "2014-11-11 MST"
> head(ts2)
[1] "2013-10-06 02:43:24 GMT" "2012-04-06 11:59:54 GMT"
[3] "2014-06-14 16:43:06 GMT" "2013-01-26 00:25:17 GMT"
[5] "2012-09-14 07:26:47 GMT" "2014-11-13 18:52:03 GMT"

Since this works so great, I do not understand the advantages of openxlsx' convertToDateTime function workflow. Using as.POSIXct in combination with openxlsx' getDateOrigin() also resolves the rounding issue that I described in my original question.

FlorianK
  • 71
  • 6
  • This workaround is fantastic. I was having stack overflow issues applying openxlsx::convertToDateTime to a large vector but the as.POSIXct workaround had no issues. I guess the convertToDateTime function is pretty memory intensive... – BejanSadeghian May 01 '20 at 18:05
  • For some reasons I've got a 2 days gap. Works with `origin = "1899-12-30'` – qfazille Nov 16 '21 at 14:13
  • 1
    There is a 2 days offset in the above calculation for 2 reason: - Excel defines 1900-01-01 with value '1' and not '0' so the origin must be shifted by 1 in as.POSIXct() - Excel considers 1900 as a "leap year" which is wrong. 1900-02-29 shouldn't exist but is added in Excel. We need to shift again one day in as.POSIXct() – jmpivette May 16 '22 at 09:06
0

I suggest saving the excel file as a csv file and using the parse_date_time() function to unify the date formats.

LEE
  • 316
  • 2
  • 8