4

I wish convert some dates in Norwegian to actual dates in R. I'm using readr, and it kind of works - but I stumbled upon an issue which really annoys me, and I don't really know how to get around it. Here is an illustration of my problem:

> parse_date(c("29. mai 2017", "29. sep 2017"), format = "%d. %b %Y", locale = locale("nn")) 
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col expected            actual       expected   <int> <int> <chr>               <chr>        actual 1     2    NA date like %d. %b %Y 29. sep 2017
[1] "2017-05-29" NA          

So it catches the date in May but not the one in September. It turns out that this is because the abbreviation for September in Norwegian needs a "." (sep. instead of sep), whereas the May abbreviations does not (probably because it's actually not an abbreviation ;-)):

locale("nb")
<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   søndag (søn.), mandag (man.), tirsdag (tir.), onsdag (ons.), torsdag(tor.), fredag (fre.), lørdag (lør.)
Months: januar (jan.), februar (feb.), mars (mar.), april (apr.), mai (mai), juni (jun.), juli (jul.), august (aug.), september (sep.), oktober (okt.), november (nov.), desember (des.)
AM/PM:  a.m./p.m.

However it seems inconsistent that it will not require the same number of charterers for all months. I also noticed that these annoying "." are not a part of the abbreviations in English:

> locale("en")
<locale>
Numbers:  123,456.78
Formats:  %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday (Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May), June (Jun), July (Jul), August (Aug),
    September (Sep), October (Oct), November (Nov), December (Dec)
AM/PM:  AM/PM

It really is terrible inconvenient also because I believe it is somewhat rare to actually include the "." at all when registration dates with abbreviations (but that is really just based on personal preferences and experience). Any input is much appreciated.

Kira
  • 85
  • 1
  • 9
  • I would recommend you look into the `stri_datetime_parse()` function in the stringi package, which does a really good job dealing with different locales and date formats. – JBGruber Jun 15 '18 at 11:39
  • IIRC what stri_datetime_parse does is to just iterate over your object checking a few different formats and hoping everything stops being NA. so the manual version of that is: first use `format = %b`. then use `format = %b.` on the leftover NA elements (or vice versa) – MichaelChirico Jun 15 '18 at 11:42

2 Answers2

1

You can edit the locale manually like this...

loc <- locale("nb")

loc$date_names$mon_ab <- substr(loc$date_names$mon_ab, 1, 3) #just take first 3 characters

parse_date(c("29. mai 2017", "29. sep 2017"), format = "%d. %b %Y", locale = loc)

[1] "2017-05-29" "2017-09-29"
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • 1
    Wauw - thank you for a quick answer - that is a better solution than my own hack :-) – Kira Jun 15 '18 at 12:24
0

A solution similar to and inspired @Andrew Gustar is creating your own date_names object:

loc <- locale("nb")
myNo <- date_names(mon = loc$date_names$mon,
                   mon_ab = substr(loc$date_names$mon_ab, 1, 3),
                   day = loc$date_names$day,
                   day_ab = substr(loc$date_names$day, 1, 3))


parse_date(c("29. mai 2017", "29. sep 2017"), format = "%d. %b %Y", locale = locale(date_names = myNo))

[1] "2017-05-29" "2017-09-29"
Kira
  • 85
  • 1
  • 9