1

I have one of these date issues.

In a data frame dfr I have two date columns due to merging, only the date with the correct year is valid and I want it in an extra column.

> head(dfr, 4)
   id year some.vars     date17     date18
1 101 2017         8 2017-11-21 2018-11-21
2 101 2018         0 2017-11-21 2018-11-21
3 102 2017         2 2017-11-23 2018-11-23
4 102 2018         9 2017-11-23 2018-11-23

So as usual I do this

dfr$date <- 0
dfr$date[dfr$year == 2017] <- dfr$date17[dfr$year == 2017]
dfr$date[dfr$year == 2018] <- dfr$date18[dfr$year == 2018]

but it gives me a date column in decimal form,

> head(dfr, 4)
   id year some.vars     date17     date18       date
1 101 2017         8 2017-11-21 2018-11-21 1511218800
2 101 2018         0 2017-11-21 2018-11-21 1542754800
3 102 2017         2 2017-11-23 2018-11-23 1511391600
4 102 2018         9 2017-11-23 2018-11-23 1542927600

which I probably have to format again with as.POSIXct() by specifying an origin or strftimeetc. but I would consider this as a workaround. (Besides dfr$date <- with(dfr, ifelse(year == 2017, date17, date18)) yields exactly the same.)

But what I want is this

> head(dfr, 4)
   id year some.vars     date17     date18       date
1 101 2017         7 2017-11-21 2018-11-21 2017-11-21
2 101 2018         0 2017-11-21 2018-11-21 2018-11-21
3 102 2017         3 2017-11-23 2018-11-23 2017-11-23
4 102 2018         5 2017-11-23 2018-11-23 2018-11-23

When I look at the subsets,

d1 <- dfr$date17[dfr$year == 2017]
d2 <- dfr$date18[dfr$year == 2018]
> sapply(list(d1, d2), class)
     [,1]      [,2]     
[1,] "POSIXct" "POSIXct"
[2,] "POSIXt"  "POSIXt"

there's nothing wrong with it. As the LHS is similar, I assume there is an <- assigning issue going on.

I also tried dfr[which(dfr["year"] == 2017), "date"] <- dfr[which(dfr["year"] == 2017), "date17"] to avoid the $ sign (I interpreted some points in this answer like so) but the approach still doesn't lead to success.

So how in base R can we combine two subsets of dates into one column of a data frame?

Data

> dput(dfr)
structure(list(id = c(101L, 101L, 102L, 102L, 103L, 103L, 104L, 
104L, 105L, 105L), year = c(2017L, 2018L, 2017L, 2018L, 2017L, 
2018L, 2017L, 2018L, 2017L, 2018L), some.vars = c(8L, 0L, 2L, 
9L, 6L, 3L, 4L, 0L, 9L, 4L), date17 = structure(c(1511218800, 
1511218800, 1511391600, 1511391600, 1511650800, 1511650800, 1511910000, 
1511910000, 1512169200, 1512169200), class = c("POSIXct", "POSIXt"
), tzone = ""), date18 = structure(c(1542754800, 1542754800, 
1542927600, 1542927600, 1543186800, 1543186800, 1543446000, 1543446000, 
1543705200, 1543705200), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA, 
-10L))

> str(dfr)
'data.frame':   10 obs. of  5 variables:
 $ id       : int  101 101 102 102 103 103 104 104 105 105
 $ year     : int  2017 2018 2017 2018 2017 2018 2017 2018 2017 2018
 $ some.vars: int  1 2 8 6 2 0 1 2 4 1
 $ date17   : POSIXct, format: "2017-11-21" "2017-11-21" "2017-11-23" "2017-11-23" ...
 $ date18   : POSIXct, format: "2018-11-21" "2018-11-21" "2018-11-23" "2018-11-23" ...
jay.sf
  • 60,139
  • 8
  • 53
  • 110

2 Answers2

1

Two quick solutions, both in terms of how you create the date column.

One:

dfr$date <- 0
class(dfr$date) <- "Date"
dfr$date[dfr$year == 2017] <- dfr$date17[dfr$year == 2017]
dfr$date[dfr$year == 2018] <- dfr$date18[dfr$year == 2018]

Second:

dfr$date <- dfr$date17
dfr$date[dfr$year == 2018] <- dfr$date18[dfr$year == 2018]

So, as both solutions suggest, the problem was with the class of the new column.

Lastly, when dealing with a case of similar size, one may exploit the order of columns simply as in

dfr$date <- dfr[cbind(1:nrow(dfr), dfr$year - 2013)]
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Ah ok, I understand. But with an `ifelse()` or more shortly it won't work, does it? E.g. `dfr$date <- 'class<-'(with(dfr, ifelse(year == 2017, date17, date18)), "Date")` fails. – jay.sf Nov 21 '18 at 16:31
  • Your example is interesting, even `ifelse(dfr$year == 2017, dfr$date17, dfr$date18)` fails. `?ifelse` discusses classes of returned objects. – Julius Vainora Nov 21 '18 at 16:36
  • Your `cbind()`trick is very clever. I probably have some cases that overlap into the next year, though. – jay.sf Nov 21 '18 at 16:38
0

When you create the date column, you are creating a numeric column:

dfr$date <- 0

Then when you assign subsequent date data, it gets coerced into numeric format.

Instead, create the date column from one or the other existing date columns, then it has the same type right off the start.

alex_danielssen
  • 1,839
  • 1
  • 8
  • 19