0

This question seems to have been asked a number of times - without a blanket useful fix - so I am trying again.

My excel data time column has the format: "MM/DD/YY HH:MM"

I am using the following code to read in the file:

datai <- read.xlsx(xlsxFile= file.path(work_dir,"..", "Databases", study, subfolder, file),
                        sheet = "Monitoring Data", detectDates = FALSE, check.names = TRUE, 
                        na.strings = "NA", fillMergedCells = FALSE, cols = 1:27)

datai$Time.Stamp..mm.dd.yy.hh.mm.<- convertToDateTime(datai$Time.Stamp..mm.dd.yy.hh.mm.)

With the code like that, the date imports correctly, but the time does not import.

And I have tried with the detectDates = TRUE which gives me an undecipherable result. For example, the first 5 dates should read in as

"12/1/15 0:00", "12/1/15 1:00", "12/1/15 2:00", "12/1/15 0:00", "12/1/15 3:00", "12/1/15 4:00" 

but instead they read in as

 "2015-12-01" "4233-01-16" "4233-02-02" "4233-02-18" "4233-03-07" 

I have also tried adding

  data <-  read.xlsx(xlsxFile= file.path(work_dir,"..", "Databases", study, subfolder, file), 
                     startRow = 1,sheet = "Monitoring Data", detectDates = TRUE, check.names = TRUE, colNames = TRUE,
                     rowNames = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, rows = NULL, sep.names = ".",
                     na.strings = "NA", fillMergedCells = FALSE, cols = 1:27, 
                     getOption("openxlsx.datetimeFormat", "mm/dd/yyyy hh:mm"))

But get this error: Region 'mm/dd/yyyy hh:mm' not found!

Following the answers on the other questions, most people suggest switching to readxl::read_excel - which does work, but it takes a significant amount of time to open the file. Does anyone have any ideas on what to try to get read.xlsx to properly read in dates?

Snipit of my excel data: Excel Data

melmo
  • 757
  • 3
  • 15
  • Would it be wrong to guess that the extra time that the readxl package takes is due to it importing date and time the way you want? – markhogue Jul 23 '20 at 20:34
  • It's quite possible. Other folks have complained of how long it takes to load as well, but this could be due to our lack of familiarity with readxlsx. Perhaps there is a setting to describe what the columns should be so it takes less time to load. – melmo Jul 24 '20 at 12:20

1 Answers1

0

If the timestamp is numeric, you can convert it manually, for instance with function convert_date in package datetimeutils (which I maintain):

library("datetimeutils") 
x <- 44038.8394469843
convert_date(x, type = "excel", fraction = TRUE)
## [1] "2020-07-26 20:08:48 CEST"

If you need a specific timezone, you can specify this as well:

convert_date(x, type = "excel", fraction = TRUE, tz = "America/New_York")
## [1] "2020-07-26 20:08:48 EDT"
Enrico Schumann
  • 1,278
  • 7
  • 8