0

I would like to change all the mixed date format into one format for example d-m-y

here is the data frame

x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))

I hv tried using this code down here, but it gives NAs

newdateformat <- as.Date(x$Birthdate,
  format = "%m%d%y", origin = "2020-6-25")

newdateformat

Then I tried using parse, but it also gives NAs which means it failed to parse

require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))

[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"

and I also could find what is the format for the first date in the data frame which is "36085.0" i did found this code but still couldn't understand what the number means and what is the "origin" means

dates <- c(30829, 38540)
  betterDates <- as.Date(dates,
    origin = "1899-12-30")

p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu

  • Did you export this data from Excel or another spreadsheet? They vary on how they handle dates so you need to export dates carefully and check the output .csv file to make sure. – dcarlson Jun 25 '20 at 16:08
  • I got the data/import from excel. xlsx file – learner96 Jun 28 '20 at 07:17

1 Answers1

1

You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.

library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
  "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table

# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]

# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]

# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]

# result

> x
   Name   Birthdate Birthdate1
1:    A     36085.0 2002-10-18
2:    B 2001-sep-12 2001-09-12
3:    C Feb-18-2005 2005-02-18
4:    D    05/27/84 1984-05-27
5:    E   2020-6-25 2020-06-25
webb
  • 4,180
  • 1
  • 17
  • 26