12

I stumbled across a peculiar behavior in the lubridate package: dmy(NA) trows an error instead of just returning an NA. This causes me problems when I want to convert a column with some elements being NAs and some date-strings that are normally converted without problems.

Here is the minimal example:

library(lubridate)
df <- data.frame(ID=letters[1:5],
              Datum=c("01.01.1990", NA, "11.01.1990", NA, "01.02.1990"))
df_copy <- df
#Question 1: Why does dmy(NA) not return NA, but throws an error?
df$Datum <- dmy(df$Datum)
Error in function (..., sep = " ", collapse = NULL)  : invalid separator
df <- df_copy
#Question 2: What's a work around?
#1. Idea: Only convert those elements that are not NAs
#RHS works, but assigning that to the LHS doesn't work (Most likely problem::
#column "Datum" is still of class factor, while the RHS is of class POSIXct)
df[!is.na(df$Datum), "Datum"] <- dmy(df[!is.na(df$Datum), "Datum"])
Using date format %d.%m.%Y.
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c(NA_integer_, NA_integer_,  :
invalid factor level, NAs generated
df #Only NAs, apparently problem with class of column "Datum"
ID Datum
1  a  <NA>
2  b  <NA>
3  c  <NA>
4  d  <NA>
5  e  <NA>
df <- df_copy
#2. Idea: Use mapply and apply dmy only to those elements that are not NA
df[, "Datum"] <- mapply(function(x) {if (is.na(x)) {
                                 return(NA)
                               } else {
                                 return(dmy(x))
                               }}, df$Datum)
df #Meaningless numbers returned instead of date-objects
ID     Datum
1  a 631152000
2  b        NA
3  c 632016000
4  d        NA
5  e 633830400

To summarize, I have two questions: 1) Why does dmy(NA) not work? Based on most other functions I would assume it is good programming practice that every transformation (such as dmy()) of NA returns NA again (just as 2 + NA does)? If this behavior is intended, how do I convert a data.frame column that includes NAs via the dmy() function?

Andrie
  • 176,377
  • 47
  • 447
  • 496
Christoph_J
  • 6,804
  • 8
  • 44
  • 58
  • It is a known issue that `lubridate` doesn't parse `NA` values correctly: https://github.com/hadley/lubridate/issues/88 – Andrie Oct 31 '11 at 16:24
  • Not a solution, but the "Error in function (..., sep = " ", collapse = NULL) : invalid separator" is being caused by the `lubridate:::guess_format()` function. The `NA` is being passed as `sep` in a call to `paste()`, specifically at `fmts <- unlist(mlply(with_seps, paste))`. – jthetzel Oct 31 '11 at 16:33

2 Answers2

6

The Error in function (..., sep = " ", collapse = NULL) : invalid separator is being caused by the lubridate:::guess_format() function. The NA is being passed as sep in a call to paste(), specifically at fmts <- unlist(mlply(with_seps, paste)). You can have a go at improving the lubridate:::guess_format() to fix this.

Otherwise, could you just change the NA to characters ("NA")?

require(lubridate)
df <- data.frame(ID=letters[1:5],
    Datum=c("01.01.1990", "NA", "11.01.1990", "NA", "01.02.1990")) #NAs are quoted
df_copy <- df

df$Datum <- dmy(df$Datum)
jthetzel
  • 3,603
  • 3
  • 25
  • 38
  • Thanks @jthetzel, that clarifies the problem. However, I'm not that confident with `R` in particular and Open-Source-Projects in general to check out the source code, fix it and send a patch. Hopefully I will one day, but until then I rather rely on the probably much more stable base `Date` class than continue with `lubridate` and run into another problem. – Christoph_J Oct 31 '11 at 17:00
  • 1
    No problem, @Christoph_J. I made a small patch of the function, which fixes the error. I'll submit to the maintainers. In the meantime, if you want to try it, source is available at: http://commondatastorage.googleapis.com/jthetzel-public/lubridate_0.2.5.tar.gz and Windows binary at: http://commondatastorage.googleapis.com/jthetzel-public/lubridate_0.2.5.zip – jthetzel Oct 31 '11 at 17:12
  • Thanks @jthetzel, that works great. And now I saw on an example how a patch is patch is proposed. – Christoph_J Oct 31 '11 at 23:04
3

Since your dates are in a reasonably straight-forward format, it might be much simpler to just use as.Date and specify the appropriate format argument:

df$Date <- as.Date(df$Datum, format="%d.%m.%Y")
df

  ID      Datum       Date
1  a 01.01.1990 1990-01-01
2  b       <NA>       <NA>
3  c 11.01.1990 1990-01-11
4  d       <NA>       <NA>
5  e 01.02.1990 1990-02-01

To see a list of the formatting codes used by as.Date, see ?strptime

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 1
    Thanks @Andrie. I'm still interested in why `dmy` doesn't do the job though because I think that the package is rather intuitive (otherwise, I always struggle with dates in R ;-) So I leave it unanswered. If no good workaround or explanation is proposed though, I will follow your advice and use base `as.Date`. – Christoph_J Oct 31 '11 at 16:19
  • 2
    Since this is a known issue (https://github.com/hadley/lubridate/issues/88) the proper procedure would be to download the code, fix the issue and send a patch to the package author. – Andrie Oct 31 '11 at 16:26
  • My bad! I went through that list, but didn't spot this one. So I will follow your advice then! Thanks again. – Christoph_J Oct 31 '11 at 16:37