1

I have a dataset in which certain columns are dates in character form. The dates are inconsistent in their formatting and missing data are present. I wrote a code to transform them in the correct format. If I use the code for each column with lapply I have no issue. When I try to implement the function to multiple columns at the same time the code gives me the following error: Error in lout[w] <- *vtmp* : NAs are not allowed in subscripted assignments

guess_date <- function(x){
  require(lubridate)
  if (!is.na(x)){
    result <- as.character(parse_date_time(x, 
guess_formats(as.character(x), c("mdy", "dmy", "dmY")))[[1]])
  }
  else {result <- NA}
  return(result)
}

df <- data.frame(a = c("12/01/1988","10/17/1999"),b = 
c("12/01/1988",NA))
df$a <- unlist(lapply(df$a , guess_date))
df$a<- as.Date(df$a, format="%Y-%m-%d")

cols <- c("a","b")
df[,cols] <- lapply(df[,cols], function(x){
  require(lubridate)
  if (!is.na(x)){
    result <- as.character(parse_date_time(x, 
guess_formats(as.character(x), c("mdy", "dmy", "dmY")))[[1]])
  }
  else {result <- NA}
  return(result)
})
MCS
  • 1,071
  • 9
  • 23

2 Answers2

2

Not sure if I am missing something but looks like you have lot of unnecessary code. This works fine

library(lubridate)
df[cols] <- lapply(df[cols], parse_date_time, c("mdy", "dmy", "dmY"))

df
#           a          b
#1 1988-12-01 1988-12-01
#2 1999-10-17       <NA>

Moreover, all the dates in the df seem to follow the same format so as.Date works fine too.

df[] <- lapply(df, as.Date, "%m/%d/%Y")

data

df <- data.frame(a = c("12/01/1988","10/17/1999"),b = ("12/01/1988",NA))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, the code works very well. Though, in my actual data I still have some dates that the code fails to parse: `Warning message: 18 failed to parse` How should I address this? – MCS Apr 29 '19 at 15:46
  • @MCS What is the format of the dates which failed to parse? We can add that format in `parse_date_time`. – Ronak Shah Apr 29 '19 at 23:04
  • The dates it fail to parse were double dates in a single cell. The rest works perfectly! – MCS Apr 30 '19 at 07:28
0

Here is an option with anytime

library(dplyr)
library(anytime)
df %>% 
     mutate_all(anydate)
#          a          b
#1 1988-12-01 1988-12-01
#2 1999-10-17       <NA>

data

df <- data.frame(a = c("12/01/1988","10/17/1999"),
          b = c("12/01/1988",NA))
akrun
  • 874,273
  • 37
  • 540
  • 662