1

I have a huge (~10.000.000 rows) dataframe with a column that consists dates, i.e:

df <- data.frame(StartDate = as.character(c("2014-08-20 11:59:38", 
                                            "2014-08-21 16:17:44", 
                                            "2014-08-22 19:02:10",
                                            "9/1/2014 08:05:13",
                                            "9/2/2014 15:13:28", 
                                            "9/3/2014 00:22:01")))

The problem is that date formats are mixed - I would like to standardise them so as to get:

   StartDate
1 2014-08-20
2 2014-08-21
3 2014-08-22
4 2014-09-01
5 2014-09-02
6 2014-09-03

1. as.Date() approach

as.Date("2014-08-31 23:59:38", "%m/%d/%Y")
as.Date("9/1/2014 00:00:28", "%m/%d/%Y") 

gives

[1] NA
[1] "2014-09-01"

2. lubridate approach

dmy("9/1/2014 00:00:28")
mdy("9/1/2014 00:00:28")
dmy("2014-08-31 23:59:38")
mdy("2014-08-31 23:59:38")

in each case returns

[1] NA
Warning message:
All formats failed to parse. No formats found. 

Is there any neat solution to that?

thesecond
  • 362
  • 2
  • 9

2 Answers2

1

Easier maybe to use parse_date

library(parsedate)
df$StartDate <- as.Date(parse_date(df$StartDate))

-output

> df$StartDate
[1] "2014-08-20" "2014-08-21" "2014-08-22" "2014-09-01" "2014-09-02" "2014-09-03"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I have just found out that anytime::anydate extracts the dates directly and straightforwardly:

library(anytime)
library(tidyverse)

df %>% 
  mutate(Date = anydate(StartDate))

#>             StartDate       Date
#> 1 2014-08-20 11:59:38 2014-08-20
#> 2 2014-08-21 16:17:44 2014-08-21
#> 3 2014-08-22 19:02:10 2014-08-22
#> 4   9/1/2014 08:05:13 2014-09-01
#> 5   9/2/2014 15:13:28 2014-09-02
#> 6   9/3/2014 00:22:01 2014-09-03

Another solution, based on lubridate:

library(tidyverse)
library(lubridate)

df %>% 
  mutate(Date = if_else(!str_detect(StartDate,"/"), 
    date(ymd_hms(StartDate, quiet = T)), date(mdy_hms(StartDate, quiet = T))))

#>             StartDate       Date
#> 1 2014-08-20 11:59:38 2014-08-20
#> 2 2014-08-21 16:17:44 2014-08-21
#> 3 2014-08-22 19:02:10 2014-08-22
#> 4   9/1/2014 08:05:13 2014-09-01
#> 5   9/2/2014 15:13:28 2014-09-02
#> 6   9/3/2014 00:22:01 2014-09-03
PaulS
  • 21,159
  • 2
  • 9
  • 26