0

To generate the output below, I am using the following code:

safe.ifelse <- function(cond, yes, no) structure(ifelse(cond, yes, no), class = class(yes))

library(lubridate)

df <- data.frame(i_date=mdy("9/1/2011") + months(seq(0,31)), t_date=mdy("2/1/2012")) r <- seq(1:nrow(df))

r <- (r - which(df$i_date == df$t_date)) %/% 12

df$r_date <- as.Date(safe.ifelse(r<0, df$i_date, df$t_date + years(r)), origin = "1970-01-01")

For good reason, I get an error if I set the t_date to be beyond the biggest i_date. Does anyone know a way to avoid this error? So instead of finding where the i_date and t_date match, replicating the t_date 12 times and adding a year, replicating again 12 times etc, I would just cascade the i_date the entire way to the end of the r_date where all three columns of the data frame have the same length. So in the case I am referring to, the i_date would match the t_date if the t_date is > max(i_date) otherwise we would do what we see below. Thanks!

i_date       t_date      r_date
9/1/2011    2/1/2012    9/1/2011
10/1/2011   2/1/2012    10/1/2011
11/1/2011   2/1/2012    11/1/2011
12/1/2011   2/1/2012    12/1/2011
1/1/2012    2/1/2012    1/1/2012
2/1/2012    2/1/2012    2/1/2012
3/1/2012    2/1/2012    2/1/2012
4/1/2012    2/1/2012    2/1/2012
5/1/2012    2/1/2012    2/1/2012
6/1/2012    2/1/2012    2/1/2012
7/1/2012    2/1/2012    2/1/2012
8/1/2012    2/1/2012    2/1/2012
9/1/2012    2/1/2012    2/1/2012
10/1/2012   2/1/2012    2/1/2012
11/1/2012   2/1/2012    2/1/2012
12/1/2012   2/1/2012    2/1/2012
1/1/2013    2/1/2012    2/1/2012
2/1/2013    2/1/2012    2/1/2013
3/1/2013    2/1/2012    2/1/2013
4/1/2013    2/1/2012    2/1/2013
5/1/2013    2/1/2012    2/1/2013
6/1/2013    2/1/2012    2/1/2013
7/1/2013    2/1/2012    2/1/2013
8/1/2013    2/1/2012    2/1/2013
9/1/2013    2/1/2012    2/1/2013
10/1/2013   2/1/2012    2/1/2013
11/1/2013   2/1/2012    2/1/2013
12/1/2013   2/1/2012    2/1/2013
1/1/2014    2/1/2012    2/1/2013
2/1/2014    2/1/2012    2/1/2014
3/1/2014    2/1/2012    2/1/2014
4/1/2014    2/1/2012    2/1/2014
user3743201
  • 53
  • 1
  • 6

2 Answers2

0

I don't think using rep with ifelse makes a lot of sense because ifelse operates on each row. I assume ones you start replacing, you will continue to do so for the rest of the data.frame. Assuming your data.frame above is called x and that the first two columns are proper date classes, then i might do

ww <- seq_along(x$i_date)-which(x$i_date == x$t_date)

to identify each row by it's offset from where the values are equal. Then we can add years to the pivot date to calculate the values for the rest of the rows

pvdate <- as.Date(
    paste(as.numeric(strftime(x$t_date[ww==0], "%Y"))+0:max(floor(ww/12)), 
    strftime(x$t_date[ww==0], "%m-%d"), sep="-")
)

That's a bit of messy date arithmetic, but it gets the job done. Now i just combine the unreplaced rows with the replaced ones

x$r_date<-c(x$i_date[ww<=0], rep(pvdate, table(floor(ww[ww>0]/12))))

It's not exactly elegant, but perhaps someone will have a better solution.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

This should work for ordered data where the i_date increments in steps of one month as in data above. I'll use lubridate package to make it easier to manipulate the dates.

I replicate your data above as a dataframe df.

library(lubridate)
td <- mdy("2/1/2012")
df <- data.frame(i_date=mdy("9/1/2011") + months(seq(0,31)),
                 t_date=td)

I create temporary index in variable r to indicate how many years (in essence 12 months) to add. Then simply add r number of years to t_date from the point where r is not negative (i.e. i_date is no longer smaller than t_date). Set r to negative if t_date is not in the range of i_date.

if (td %in% df$i_date) {
  r <- (seq(1:nrow(df)) - which(df$i_date == df$t_date)) %/% 12
} else { r <- rep(-1, nrow(df)) }
df$r_date <- as.POSIXct(ifelse(r<0, df$i_date,
                                    df$t_date + years(r)), origin = "1970-01-01")

My result below.

       i_date     t_date              r_date
1  2011-09-01 2012-02-01 2011-09-01 08:00:00
2  2011-10-01 2012-02-01 2011-10-01 08:00:00
3  2011-11-01 2012-02-01 2011-11-01 08:00:00
4  2011-12-01 2012-02-01 2011-12-01 08:00:00
5  2012-01-01 2012-02-01 2012-01-01 08:00:00
6  2012-02-01 2012-02-01 2012-02-01 08:00:00
7  2012-03-01 2012-02-01 2012-02-01 08:00:00
8  2012-04-01 2012-02-01 2012-02-01 08:00:00
9  2012-05-01 2012-02-01 2012-02-01 08:00:00
10 2012-06-01 2012-02-01 2012-02-01 08:00:00
11 2012-07-01 2012-02-01 2012-02-01 08:00:00
12 2012-08-01 2012-02-01 2012-02-01 08:00:00
13 2012-09-01 2012-02-01 2012-02-01 08:00:00
14 2012-10-01 2012-02-01 2012-02-01 08:00:00
15 2012-11-01 2012-02-01 2012-02-01 08:00:00
16 2012-12-01 2012-02-01 2012-02-01 08:00:00
17 2013-01-01 2012-02-01 2012-02-01 08:00:00
18 2013-02-01 2012-02-01 2013-02-01 08:00:00
19 2013-03-01 2012-02-01 2013-02-01 08:00:00
20 2013-04-01 2012-02-01 2013-02-01 08:00:00
21 2013-05-01 2012-02-01 2013-02-01 08:00:00
22 2013-06-01 2012-02-01 2013-02-01 08:00:00
23 2013-07-01 2012-02-01 2013-02-01 08:00:00
24 2013-08-01 2012-02-01 2013-02-01 08:00:00
25 2013-09-01 2012-02-01 2013-02-01 08:00:00
26 2013-10-01 2012-02-01 2013-02-01 08:00:00
27 2013-11-01 2012-02-01 2013-02-01 08:00:00
28 2013-12-01 2012-02-01 2013-02-01 08:00:00
29 2014-01-01 2012-02-01 2013-02-01 08:00:00
30 2014-02-01 2012-02-01 2014-02-01 08:00:00
31 2014-03-01 2012-02-01 2014-02-01 08:00:00
32 2014-04-01 2012-02-01 2014-02-01 08:00:00
Ricky
  • 4,616
  • 6
  • 42
  • 72
  • This is EXACTLY what I wanted, however, when I attempt to use your code, I get the following messages: Error: object 'r' not found after the line where you have r <- (r - which(df$i_date == df$t_date)) %/% 12 – user3743201 Jun 16 '14 at 15:13
  • I keep trying to initialize r in the above code with different values and I can't seem to get anything to cascade down in the manner than you have in your output. – user3743201 Jun 16 '14 at 15:59
  • Apologies I missed out one line. r starting value should be the sequence of number of rows: r <- seq(1:nrow(df)) I've edited my answer above, see if it works now. – Ricky Jun 17 '14 at 03:33
  • I will try it as soon as I get home thank you so much for your help I greatly appreciate it! – user3743201 Jun 17 '14 at 03:35
  • Running that code I am getting the last day of each month like: 2011-08-31 19:00:00 2011-09-30 19:00:00 etc – user3743201 Jun 17 '14 at 05:02
  • I am now using this code and it seems to be working: safe.ifelse <- function(cond, yes, no) structure(ifelse(cond, yes, no), class = class(yes)) library(lubridate) df <- data.frame(i_date=mdy("9/1/2011") + months(seq(0,31)), t_date=mdy("2/1/2012")) r <- seq(1:nrow(df)) r <- (r - which(df$i_date == df$t_date)) %/% 12 df$r_date <- as.Date(safe.ifelse(r<0, df$i_date, df$t_date + years(r)), origin = "1970-01-01") – user3743201 Jun 17 '14 at 05:17
  • One minor annoyance, when the i_rate never reaches the t_date, I am getting an error which makes sense because r is empty. Is there an easy way around that? – user3743201 Jun 17 '14 at 05:49
  • It was initially meant to be a a quick answer based on your sample data. Once you want to put error checking it gets a bit complicated. When it never reaches, it can be because `t_date` is smaller than smallest `i_date`, bigger than biggest `i_date`, in-between `i_date`. Which one to cover, or all? Also, can we assume that `t_date` is always the same, as in your example above? It will be different error-checking if we were to cater for differing `t_date`. On the other hand, if `t_date` is always the same, I'd think it better to keep `t_date` as a constant outside the table / dataframe. – Ricky Jun 17 '14 at 08:08
  • Based on my data t_date can never be smaller than the min of i_date but can be bigger than the max of i_date – user3743201 Jun 17 '14 at 11:31
  • I will assume `t_date` value is a constant then, that I will initially store in `td`. td <- mdy("2/1/2012") df <- data.frame(i_date=mdy("9/1/2011") + months(seq(0,31)), t_date=td) if (td %in% df$i_date) { r <- (seq(1:nrow(df)) - which(df$i_date == df$t_date)) %/% 12 } else { r <- rep(-1, nrow(df)) } df$r_date <- as.POSIXct(ifelse(r<0, df$i_date, df$t_date + years(r)), origin = "1970-01-01") I hope this is sufficient to close this question. – Ricky Jun 19 '14 at 06:38
  • Great! can you accept it as the answer for the question then? – Ricky Jun 19 '14 at 21:21