5

I have CSV file with "date" column but it contains two different date format as the following

7/12/2015 15:28 as m/d/yyyy hh:mm
18-04-2016 18:20  as d/m/yyyy hh:mm

How can I change the format into m/d/yyyy hh: mm, So I can subtract the dates from each other?

M--
  • 25,431
  • 8
  • 61
  • 93
  • Could you please frame the question better? Are two date formats in the same column and do we need to split before subtracting? Also it would be worth putting down the ideas you have tried? First take a look at `lubridate` package – discipulus Feb 08 '17 at 00:26

4 Answers4

9

External packages are not required if you simply have two datetime formats. Just run both formats through the parser and take the non-missing one:

x <- c("7/12/2015 15:28","18-04-2016 18:20")
pmax(
  as.POSIXct(x, format="%m/%d/%Y %H:%M", tz="UTC"),
  as.POSIXct(x, format="%d-%m-%Y %H:%M", tz="UTC"),
  na.rm=TRUE
)
#[1] "2015-07-12 15:28:00 UTC" "2016-04-18 18:20:00 UTC"

As far as I know, there is absolutely no way to deal with ambiguous date formats automatically, so hard-coding is the way to go here probably.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
4

As @thelatemail stated, it is difficult to deal with ambiguous date formats. You have the worst here: a combination of (North American) month-day-year along with (rest-of-the work) day-month-year.

Now, the anytime package helps here in general as it allows us to parse without requiring explicit formats while also allowing different input formats in the same string.

However, it too must retain some sanity -- and hence does not support mixing d/m/y and m/d/y as you do here by default because there is just no way to automate this.

But here we can just opt to add a single missing format, and all is well:

R> library(anytime)
R> anytime::addFormats("%d-%m-%Y %H:%M:%S")  # add a day-month-year variant
R> anytime(c("7/12/2015 15:28", "18-04-2016 18:20"))
[1] "2015-07-12 15:28:00 CDT" "2016-04-18 18:20:00 CDT"
R> 

And with that, the difference is a simple

R> diff(anytime(c("7/12/2015 15:28", "18-04-2016 18:20")))
Time difference of 281.119 days
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I think we essentially agree :-) To remind yourself how lucky we are to have sane R date tools, you should spend a day working with SAS' date inputs/outputs. – thelatemail Feb 08 '17 at 03:42
  • :) And I quite like the trick you use too. Nice work of `na.rm` to remove unwanted alternatives. But I remain beholden to my little package and it (mostly) _automagic_ parsing. – Dirk Eddelbuettel Feb 08 '17 at 03:43
3

Here is another option with lubridate

library(lubridate)
parse_date_time(x, c("mdy HM", "dmy HM"))   
#[1] "2015-12-07 15:28:00 UTC" "2016-04-18 18:20:00 UTC"

data

x <- c("7/12/2015 15:28","18-04-2016 18:20")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I have forgotten about this so was trying `anytime` which is much slower. Thank you ^__^ – M-- Feb 04 '20 at 19:32
  • 1
    Yeah. I was trying to limit it to check only the desired formats, and stumbled upon your answer. I guess I can post a question about that later when I am not crushed by deadlines, see if Dirk has something to offer ;) – M-- Feb 04 '20 at 19:53
-3

Not all *.csv files are created equally. If the date information is coded as a date in Excel or an open office format but simply displayed in two different formats in the file, it would be much simpler to change the dates in a spreadsheet environment than trying to tackle the task in R.

Tavrock
  • 558
  • 5
  • 15