11

This is weird: R's ifelse() seems to do some (unwanted) casting: Lets say I have a vector of timestamps (possibly NA) and NA values should be treated differently than existing dates, for example, just ignored:

formatString = "%Y-%m-%d %H:%M:%OS"
timestamp = c(as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString)) + (1:3)*30, NA)

Now

timestamp
#[1] "2000-01-01 12:00:30 CET" "2000-01-01 12:01:00 CET" "2000-01-01 12:01:30 CET"
#[6] NA    

as desired but translation by 30 seconds results in

ifelse(is.na(timestamp), NA, timestamp+30)
#[1] 946724460 946724490 946724520        NA

Notice that still, timestamp+30 works as expected but lets say I want to replace NA dates by a fixed date and translate all the others by 30 secs:

fixedDate = as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString))
ifelse(is.na(timestamp), fixedDate, timestamp+30)
#[1] 946724460 946724490 946724520 946724400

Question: whats wrong with this solution and why doesn't it work as expected?

Edit: the desired output is a vector of timestamps (not of integers) translated by 30 secs and the NA's being replaced by whatever...

Kara
  • 6,115
  • 16
  • 50
  • 57
Fabian Werner
  • 957
  • 11
  • 19
  • 4
    What doesn't work as expected? I get `NA` replaced by `fixedDate`. Or I don't understand the problem. –  Jun 30 '15 at 08:43
  • 2
    I second @Pascal `as.numeric(fixedDate) == ifelse(is.na(timestamp), fixedDate, timestamp+30)[4]` return `TRUE` so not sure what's the issue really – David Arenburg Jun 30 '15 at 08:46
  • 1
    I suspect the question to ask is: What is your expected output ? This may help understand where you're stuck – Tensibai Jun 30 '15 at 08:50
  • 2
    You should study the **Value** and **Warning** section of `?ifelse` and the third set of examples ("`ifelse()` strips attributes"; "This is important when working with Dates"). [A relevant post](https://stat.ethz.ch/pipermail/r-help/2011-June/280186.html) (first hit when googling "r ifelse as.POSIXct"). – Henrik Jun 30 '15 at 09:22
  • 2
    I edited the title to make the question clearer than *"Is ifelse broken?"*. You needed to say it was stripping the POSIXct attribute. – smci Jun 30 '15 at 23:23
  • I recommend checking out this solution via dplyr: https://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects – icj Jan 15 '19 at 18:02

2 Answers2

8

If you look at the way ifelse is written, it has a section of code that looks like this:

ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
  ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]

Note that the answer starts off as a logical vector, the same as test. The elements that have test == TRUE then get assigned to the value of yes.

The issue here then is with what happens with assignment of an element or elements of a logical vector to be a date of class POSIX.ct. You can see what happens if you do this:

x <- c(TRUE, FALSE)
class(x)
# logical
x[1] <- Sys.time()
class(x)
# numeric

You could get around this by writing:

timestamp <- timestamp + 30
timestamp[is.na(timestamp)] <- fixedDate

You could also do this:

fixedDate = as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString))
unlist(ifelse(is.na(timestamp), as.list(fixedDate), as.list(timestamp+30)))

This takes advantage of the way the replacement operator [<- handles a list on the right hand side.

You can also just re-add the class attribute like this:

x <- ifelse(is.na(timestamp), fixedDate, timestamp+30)
class(x) <- c("POSIXct", "POSIXt")

or if you were desperate to do it in one line like this:

`class<-`(ifelse(is.na(timestamp), fixedDate, timestamp+30), c("POSIXct", "POSIXt"))

or by copying the attributes of fixedDate:

x <- ifelse(is.na(timestamp), fixedDate, timestamp+30)
attributes(x) <- attributes(fixedDate)

This last version has the advantage of copying the tzone attribute as well.

As of dplyr 0.5.0, you can also use dplyr::if_else which preserves class in the output and also enforces the same class for the true and false arguments.

Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52
  • 1
    What about just reformating the dates as strings with `strptime(ifelse(...),"%s")` as it's coerced to a number of seconds. – Tensibai Jun 30 '15 at 09:04
  • 1
    @Tensibai agree that would be more straightforward for dates in particular. – Nick Kennedy Jun 30 '15 at 10:20
  • @Tensibai even more straightforward would be to just add back the appropriate `class`. – Nick Kennedy Jun 30 '15 at 10:46
  • do you mean in the ifelse code or at the return (the strptime give a little more control, you can give the timezone to get the values with correct DST etc.). I disagree setting it in the ifelse code, too much overhead for a too narrow scope of usage at end – Tensibai Jun 30 '15 at 10:54
  • @NickK: Ah I see... this is interesting... R is then somehow the 'counterpart' of magma [which complains about type conversions all the time]... integer is the 'common' (?) overclass of logical and date... – Fabian Werner Jun 30 '15 at 11:19
  • 1
    @Tensibai I meant like this `x <- ifelse(is.na(timestamp), fixedDate, timestamp + 30); class(x) <- c("POSIXct", "POSIXt")` – Nick Kennedy Jun 30 '15 at 11:23
  • @FabianWerner Nope, just that a POSIXct is stored internally as an integer (number of seconds from January 1st 1970), so when it has to loose it's type and all properties, what is returned is this integer part. – Tensibai Jun 30 '15 at 11:25
  • @NickK Yes, that's valid unless you have a source date with a specific timezone different from your machine timezone, that's why I tend to prefer strptime. And it could be done in the same statement, which is a cosmetic preference. – Tensibai Jun 30 '15 at 11:33
  • @Tensibai Well, POSIXct != integer per se, because when you give R a POSIXct timestamp and print it out then you get, in fact, a timestamp.... so Im talking about the 'whole object' (including the information that this integer actually means something else). A string is also just a number, everything is just a number and yet, it is something completely different :-) – Fabian Werner Jun 30 '15 at 12:39
  • @Tensibai and what you call 'loose the type' I call 'convert it to common overclass'... so yes, we mean the same – Fabian Werner Jun 30 '15 at 12:40
  • @FabianWerner In my point of view, this is not an overclass as there's no inheritance, but yes, we're on the same line with a different point of view :) – Tensibai Jun 30 '15 at 12:41
  • 1
    This question was answered here with an effective dplyr/tidy solution: https://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects – icj Jan 15 '19 at 18:01
1

As Henrik remarked, ifelse() strips attributes, unlike a simple for-loop.

A workaround to filling NAs without grief is the simpler and clearer function zoo::na.fill

Then you would do: na.fill(timestamp, fixedDate)

See also na.locf, na.approx, na.spline ..., other excellent convenience functions from zoo.

smci
  • 32,567
  • 20
  • 113
  • 146
  • As remarked: a usual for-loop over the vector would return a sequence of timestampts, not integers. – Fabian Werner Jun 30 '15 at 18:26
  • Updated to cover that – smci Jun 30 '15 at 18:33
  • 1
    Oh and by the way: I dont see why ifelse does this. I dont think that it is perfectly fine... its just a pitfall. One could also fourier transform the output and would get something that 'could' theoretically be reinterpreted as timestamps but still: One does not do it like this. Why? Because it would be crap: If im putting in timestamps, I expect the output to be of the same type... – Fabian Werner Jun 30 '15 at 18:48
  • I agree it's not ok, is a little-known pitfall and should be documented more prominently (if not also trigger a *"Warning: Attributes dropped by ifelse ...."*). I recommend you use `zoo::na.fill()` like I said. It's faster, simpler and clearer. – smci Jun 30 '15 at 20:37