0

I have a data frame with a Date columns, without time. I would like to convert it to a date time format, using 00:00:00 as time stamp. And print the time as well.

From these posts 1, 2 and 3, I get that time formatting in R might omit midnight, so I then use @ACuriousCat solution to print the time. The simpler code I have is:

data<-c(NA,"2014-03-18","2014-04-01","2014-04-15","2014-04-28","2014-05-14")
> data
[1] NA "2014-03-18" "2014-04-01" "2014-04-15" "2014-04-28" "2014-05-14"
> data1<-format(as.POSIXct(data,tz='UTC'),"%Y-%m-%d %H:%M:%S")
> data1
[1] NA "2014-03-18 00:00:00" "2014-04-01 00:00:00" "2014-04-15 00:00:00" "2014-04-28 00:00:00"
[6] "2014-05-14 00:00:00"

Which works great! However, on my real dataset, the time will be

> data1
[1] NA "2014-03-18 01:00:00" "2014-04-01 02:00:00" "2014-04-15 02:00:00" "2014-04-28 02:00:00"
[6] "2014-05-14 02:00:00"

It looks like a time zone issue + a daylight saving time issue in the way my data is read or coded in R. But how could I solve that? I tried different time zone, it didn't work. All I can do so far to solve it is:

> data1<-format(as.POSIXct(as_datetime(as.double(as.POSIXct(data)+3600)-3600),tz='UTC'),"%Y-%m-%d %H:%M:%S")
> data1
[1] NA "2014-03-18 00:00:00" "2014-04-01 00:00:00" "2014-04-15 00:00:00" "2014-04-28 00:00:00"
[6] "2014-05-14 00:00:00"

Is there a less convoluted way to code this?

Mata
  • 538
  • 3
  • 17

1 Answers1

1

It seems that in your manual check and sample you have your dates as a character string, and where it goes wrong on your real data table / frame you have probably the dates as a Date column (with another TZ set).

Here illustrated with dates (character) and dates2 (as.Date)

data <- data.table(
  dates = c(NA,"2014-03-18","2014-04-01","2014-04-15","2014-04-28","2014-05-14")
)
data[, dates2 := as.Date(dates)]

data[, datetime := format(as.POSIXct(dates, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]
data[, datetime2 := format(as.POSIXct(dates2, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]

str(data)

# Classes ‘data.table’ and 'data.frame':    6 obs. of  4 variables:
#  $ dates    : chr  NA "2014-03-18" "2014-04-01" "2014-04-15" ...
#  $ dates2   : Date, format: NA "2014-03-18" "2014-04-01" "2014-04-15" ...
#  $ datetime : chr  NA "03-18-2014 00:00:00" "04-01-2014 00:00:00" "04-15-2014 00:00:00" ...
#  $ datetime2: chr  NA "03-18-2014 01:00:00" "04-01-2014 02:00:00" "04-15-2014 02:00:00" ...
#  - attr(*, ".internal.selfref")=<externalptr> 

Edit

If you work with a character column with dates you can use this

data[, dates := as.character(dates)]
data[, datetime := format(as.POSIXct(dates, tz = "UTC"), "%m-%d-%Y %H:%M:%S")]

If you had converted your dates to a Date colum you can use this

data[, dates := as.Date(dates)]
data[, datetime := format(as.POSIXct(dates), "%m-%d-%Y %H:%M:%S", tz = "UTC")]

As format returns a string anyhow, the best solution is actually this:

data[!is.na(dates), datetime := paste(dates, "00:00:00")]
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • Thanks, that helps! I indeed have my dates as a Date column, and the time zone is set as UTC (ckecked with `tz()`) However, in your example, datetime and datetime2 are generated with tz = UTC as well, so I am confused why it produces different results in your example. Unless `as.Dates` changes that? – Mata Nov 24 '21 at 13:21
  • No that is just done for my sample where I started as character strings. The key point seems to be when you have dates, you use the timezone on the format, while if you take the strings you set the time zone to the as.POSIXct. But besides trying to understand the behaviour which is interesting. Either way with format you end up with a formatted character string, so the easiest way is actually the paste() solution. – Merijn van Tilborg Nov 24 '21 at 13:29