-1

I read in an Excel file with one column containing dates. In some rows there aren't complete dates, but only the year is given, e.g., 2018. What I obtain in R is

> df$date
[1]    NA    NA 43465 43465 43465 43465  2018    NA 43465 43465 43465 43465

I want to convert those rows to date format which are complete dates, i.e. in my example only the rows with 43465 and leave the other rows as they are, i.e. NA should stay NA and 2018 should stay 2018.

I know that I can convert Excel dates as follows as.Date(df$date, origin="1899-12-30") but the following two ideas give me the wrong output

> as.Date(df$date, origin="1899-12-30")
[1] NA NA  "2018-12-31" "2018-12-31" "2018-12-31" "2018-12-31" "1905-07-10" NA "2018-12-31" "2018-12-31" "2018-12-31"
[12] "2018-12-31"

Of course "1905-07-10" is not what I expected.

> ifelse(df$date == 2018, 2018, as.Date(df$date, origin="1899-12-30"))
[1]    NA    NA 17896 17896 17896 17896  2018    NA 17896 17896 17896 17896

Here the wrong output is obvious.

user29184
  • 45
  • 3

2 Answers2

0

A vector cannot have multiple classes. You can either have numeric or Date class in them. A workaround is to use character class which is the most generic one.

x <- c(NA,NA,43465,43465,43465,43465,2018,NA,43465,43465, 43465, 43465)
ifelse(x == 2018, "2018", as.character(as.Date(x, origin="1899-12-30")))

# [1] NA  NA   "2018-12-31" "2018-12-31" "2018-12-31" "2018-12-31" "2018"      
# [8] NA   "2018-12-31" "2018-12-31" "2018-12-31" "2018-12-31"

A list however, can have multiple classes so if you are ok to store data in a list, we can use lapply

lapply(x, function(y) 
     if (y == 2018 | is.na(y)) y else as.Date(y, origin="1899-12-30"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, but unfortunately both do not solve my problem, since I really need my data in a data frame and I need to be able to compare two dates, i.e. `2018-12-31 > 2017-12-31` should return `TRUE` and with characters this is not possible. – user29184 Nov 06 '19 at 09:07
  • @user29184 So why do you want to keep 2018 as you'll not be able to compare it with any date anyway. You can turn it to `NA` as well. – Ronak Shah Nov 06 '19 at 09:18
0

Building off of @Ronak's answer, you can use regex to determine a four digit numerical number, then pad with four trailing zeros.

x <- c(NA,NA,43465,43465,43465,43465,2018,NA,43465,43465, 43465, 43465)
ifelse(grepl('^\\d{4}$', x, perl = TRUE), 
   as.integer(paste0(x, '0000')), 
   as.integer(format(as.Date(x, origin='1899-12-30'), '%Y%m%d')))
[1]  NA       NA 20181231 20181231 20181231 20181231 20180000       NA 20181231 20181231 20181231 20181231

You'll get some warning messages regarding the NAs, and if it bothers you you can add an additional ifelse to control the NAs. Here we use a logical grep test to see if there are only four numbers (a year), then we create an integer of the values. This allows you to still use mathematical operators such as >,<,==, etc and preserve all the information.

You can change the '0000' during the paste0() call to a more appropriate number based upon the data or use case.

sempervent
  • 833
  • 2
  • 11
  • 23