1

I'm trying to generate a time series from 2000-01-01 00:00:00 to 2020-12-31 23:00:00 (hourly timestep), without taking into account daylight savings. Other posts suggest using GMT or UTC when generating it as POSIX. So this is what I tried:

##   create date seq
Dates <- seq(as.POSIXct("2000-01-01 00:00:00"), as.POSIXct("2020-12-31 23:00:00"), by = "hour", tz='UTC')
Dates <- as.data.frame(Dates)
colnames(Dates)[1] <- "DatesR"

## check dup 2
testing <- as.character(Dates$DatesR)
dup <- as.data.frame(which(duplicated(testing))) ## not good

As you can see, duplicates are still present. Skipped values also exist.

I also tried using zone instead of tz, like that:


##   create date seq
Dates <- seq(as.POSIXct("2000-01-01 00:00:00"), as.POSIXct("2020-12-31 23:00:00"), by = "hour", zone='UTC')
Dates <- as.data.frame(Dates)
colnames(Dates)[1] <- "DatesR"

## check dup 2
testing <- as.character(Dates$DatesR)
dup <- as.data.frame(which(duplicated(testing))) ## not good

Still doesn't work. Any recommendations??

1 Answers1

1

tz= is an argument of as.POSIXct, not of seq.

from <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
to <- as.POSIXct("2020-12-31 23:00:00", tz = "UTC")
s <- seq(from, to, by = "hour")

anyDuplicated(format(s))
## [1] 0

It is also possible to set the entire session to default to UTC.

Sys.setenv(TZ = "UTC")

Sys.timezone()  # check that it has been set
## [1] "UTC"

from2 <- as.POSIXct("2000-01-01 00:00:00")
to2 <- as.POSIXct("2020-12-31 23:00:00")
s2 <- seq(from2, to2, by = "hour")

anyDuplicated(format(s2))
## [1] 0

Also note that seq does not actually produce any duplicates using the default time zone. It is the conversion to character that introduces the duplicates.

Sys.setenv(TZ = "")  #  change back to default TZ

from3 <- as.POSIXct("2000-01-01 00:00:00")
to3 <- as.POSIXct("2020-12-31 23:00:00")
s3 <- seq(from3, to3, by = "hour")

anyDuplicated(s3)
## [1] 0

anyDuplicated(format(s3))
## [1] 7250
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341