9

I have a vector of numeric excel dates i.e.

date <- c(42963,42994,42903,42933,42964)

The output am I expecting when using excel_numeric_to_date function from janitor package and as.yearmon function from zoo package

as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2016" "Sep 2016" "Jun 2017" "Jul 2017" "Aug 2017".

However, the conversion for the first to elements of the date vector are incorrect. The actual result are:

as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"

I have tried using different option(modern and mac pre-2011) for the date_system argument in the excel_numeric_to_date but it does not help either

The excel version is 2010

Rick Pack
  • 1,044
  • 10
  • 20
Azam Yahya
  • 646
  • 1
  • 7
  • 10
  • Where did you get those numbers? Why weren't they loaded as *dates* in the first place? In any case, Excel uses the OLE Automation storage format for dates, which is why such values are called OA Dates. The integer is a date offset. The fractional part is an instance in the day. – Panagiotis Kanavos Nov 03 '17 at 10:20
  • The best solution would be to modify the code that reads those values to read them as dates in the first place. What package did you use to load them? How did you do it? – Panagiotis Kanavos Nov 03 '17 at 10:23

3 Answers3

11

You can simply use as.Date and specify the origin, i.e.

as.Date(date, origin="1899-12-30") 
#[1] "2017-08-16" "2017-09-16" "2017-06-17" "2017-07-17" "2017-08-17"

#or format it to your liking,

format(as.Date(date, origin="1899-12-30"), '%b %Y') 
#[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"

This link gives quite a bit of information on this matter.

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 2
    Note that `help("as.Date")` contains relevant information regarding the Excel date origin. – Roland Nov 03 '17 at 10:20
  • 1
    @ Panagiotis Kanavos, The question stated that the input is dates and in any case it does seem to work even if there are times. These are represented by a fraction of a day in Excel and using noon for the test data, `as.yearmon(as.Date(date + 0.5, origin = "1899-12-31"))` and it gives the correct answer. – G. Grothendieck Nov 03 '17 at 13:09
  • 1
    Also see the Other Applicatoins section of page 30 of R News 4/1 -- https://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf for info on Excel and dates. – G. Grothendieck Nov 03 '17 at 13:13
  • Hi. How do you know whats the value for `origin`? Because sometimes I use `origin="1899-12-30"` and others, `origin="1582-10-14"`. Thanks – jgarces Jul 21 '21 at 16:45
4

If you want to convert dates from Excel, you can use as.Date() with a specific origin. According to the documentation, "1900-01-0"' is used as day in Excel on Windows, but "this is complicated by Excel incorrectly treating 1900 as a leap year". So "1899-12-30" should be used for dates post 1901:

date <- c(42963,42994,42903,42933,42964)

This is the result of as.Date():

as.Date(date, origin = "1899-12-30")
[1] "2017-08-18" "2017-09-18" "2017-06-19" "2017-07-19" "2017-08-19"

You can then use zoo::as.yearmon()` to get the expected outcome:

zoo::as.yearmon(as.Date(date, origin = "1899-12-30"))
[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
clemens
  • 6,653
  • 2
  • 19
  • 31
  • 3
    you have to use "1899-12-30" otherwise you end up with two more days (or so), for example 4299 is 2017-15-09 in Excel but with origin="1900-01-01" you get 2017-09-17 (see answer above from Sotos) – R. Prost Mar 19 '18 at 09:28
  • please edit your answer to the above comment. i spent 20 mins trying to figure out why i was missing 2 dates – Jerin Mathew Jul 21 '21 at 17:43
2

Type excel_numeric_to_date to look at the function's code and you'll see it's a wrapper for the line of code used by the other answers to this question: as.Date(date_num, origin = "1899-12-30").

So that's not the issue.

The underlying matter here is confusion about date formatting. You say you expect your first number 42963 to become "Aug 2016", and your last number 42964 to become "Aug 2017". The latter is just one more than the former, which shows up in the conversion - they should be a day apart, not a year apart as you are expecting:

> excel_numeric_to_date(c(42963, 42964))
[1] "2017-08-16" "2017-08-17" # as expected, they are one day apart

Perhaps the day and year fields are switched upstream in your data at the point where these get mapped to integer dates, and it was hard to tell here because of the values chosen.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Sam Firke
  • 21,571
  • 9
  • 87
  • 105