0

When working on a dataset with dates, I realised that R will give me NAs for the dates between 2am and 3am on March 28th 2021 CET. For any other day and time combination, the dates are displayed correctly.

In my data, I have a variable named date.time and a variable named minutes.to.add:

install.packages("tidyverse")
install.packages("lubridate")
library(tidyverse)
library(lubridate)

date.time = as.POSIXct(c("2021-03-27 00:00:00",
                    "2021-03-27 00:00:00",
                    "2021-03-27 00:00:00",
                    "2021-03-28 00:00:00",
                    "2021-03-28 00:00:00", 
                    "2021-03-28 00:00:00", 
                    "2021-03-29 00:00:00",
                    "2021-03-29 00:00:00",
                    "2021-03-29 00:00:00"), 
                  format="%Y-%m-%d %H:%M:%S", tz="CET")

minutes.to.add = c(119, 120, 180, 119, 120, 180, 119, 120, 180)

df = data.frame(date.time, minutes.to.add)

df

> df
   date.time minutes.to.add
1 2021-03-27            119
2 2021-03-27            120
3 2021-03-27            180
4 2021-03-28            119
5 2021-03-28            120
6 2021-03-28            180
7 2021-03-29            119
8 2021-03-29            120
9 2021-03-29            180

Now, I would like to add the values in minutes.to.add as minutes to the date.time variable using lubridate::minutes. As you can see, this works well, except for row 5. Here, the value should be 2021-03-28 02:00:00, but instead, I receive a NA. This will happen for all dates between 2021-03-28 02:00:00 and 2021-02-28 02:59:59 CET:

df %>% 
  mutate(new.date.time = date.time + lubridate::minutes(minutes.to.add))

1 2021-03-27         119 2021-03-27 01:59:00
2 2021-03-27         120 2021-03-27 02:00:00
3 2021-03-27         180 2021-03-27 03:00:00
4 2021-03-28         119 2021-03-28 01:59:00
5 2021-03-28         120                <NA>
6 2021-03-28         180 2021-03-28 03:00:00
7 2021-03-29         119 2021-03-29 01:59:00
8 2021-03-29         120 2021-03-29 02:00:00
9 2021-03-29         180 2021-03-29 03:00:00

The problem also occurs, when dates are specified using as.POSIXct:

tibble(as.POSIXct(c("2021-03-27 02:00:00",
             "2021-03-27 02:59:59",
             "2021-03-27 03:00:00",
             "2021-03-28 02:00:00",
             "2021-03-28 02:59:59", 
             "2021-03-28 03:00:00", 
             "2021-03-29 02:00:00",
             "2021-03-29 02:59:59",
             "2021-03-29 03:00:00"), 
           format="%Y-%m-%d %H:%M:%S", tz="CET"))

# A tibble: 9 x 1
  `as.POSIXct(...)`  
  <dttm>             
1 2021-03-27 02:00:00
2 2021-03-27 02:59:59
3 2021-03-27 03:00:00
4 NA                 
5 NA                 
6 2021-03-28 03:00:00
7 2021-03-29 02:00:00
8 2021-03-29 02:59:59
9 2021-03-29 03:00:00

I tried lubridate instead to check, if the error is specific to as.POSIXct. However, the error also occurs, but only with tz = "CET":

as_datetime("2021-03-28 02:00:00")
as_datetime("2021-03-28 02:00:00", tz = "CET")

> as_datetime("2021-03-28 02:00:00")
[1] "2021-03-28 02:00:00 UTC"
> as_datetime("2021-03-28 02:00:00", tz = "CET")
[1] NA
Warning message:
 1 failed to parse. 

Does anyone know why this happens and how to solve this?

ja_doe
  • 1
  • 3
  • Random Guess : Daylight Savings. – Ronak Shah Sep 16 '21 at 10:04
  • Yes, it's a DST problem. If in your country the date of change to DST was not March 28, you need to specify the timezone when creating the datetime object. For example, if you specify `tz="EST"` in the `POSIXct` function you won't get a missing value because in the DST started on March 14 in the USA. – Claudio Sep 16 '21 at 10:19
  • You guys are right, it's daylight savings. I was so focused on a potential problem with R that I did not see the obvious. Thank you so much! – ja_doe Sep 16 '21 at 11:00

0 Answers0