2

I want to find a first date greater than the given date in a column. eg:

Pnp, Date1 Date2  
A100,1/1/2020,1/1/2020  
A100,1/1/2020,1/7/2020  
A100,1/1/2020,1/1/2021  
A100,1/1/2020,1/7/2021  

Sample output:

Pnp,Date1,Date2,Date3,New Column  
A100,1/1/2020,1/1/2020, 1/7/2020  
A100,1/1/2020,1/7/2020,1/7/2020  
A100,1/1/2020,1/1/2021,1/72020  
A100,1/1/2020,1/7/2021,1/7/2020  

I mean Based on date in date1 which date is greater than Date1 in Date2 (First greater value in Date2) to be put in New column.

sample code is :


library(dplyr)
library(sqldf)
monthly_sequence_03<- data.frame('Pnp' = 'A100','Frequency' = 3,'Duration' = c('Month'),'Date1' =seq(as.Date('2020-01-01'), as.Date('2025-6-30'), by = '3 months'))
monthly_sequence_06<- data.frame('Pnp' = 'A100','Frequency' = 6,'Duration' = c('Month'),'Date2' =seq(as.Date('2020-01-01'), as.Date('2025-6-30'), by = '6 months'))

new_df <- sqldf("select a.*,b.Date2 from monthly_sequence_03 as a
                left join monthly_sequence_06 as b
                on a.pnp = b.pnp")


new_df <-new_df[
  order( new_df[,3], new_df[,4] ),
  ]

Any help is highly appreciated.

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36

3 Answers3

1

I would calculate leads of Date2 and join them back to your dataframe.

new_df %>% 
  left_join(new_df %>% transmute(Date2, Date3 = lead(Date2)) %>% distinct(), by = c("Date1" = "Date2"))
eastclintw00d
  • 2,250
  • 1
  • 9
  • 18
0

If you are trying to keep the date that is greater between the two dates in those columns, write a quick function for that and apply it over the columns in DF to create new column.

This could be written like this:

 new_df$Date = as.Date(sapply(1:nrow(new_df), function(x){
    Date1 = new_df$Date1[x]
    Date2 = new_df$Date2[x]
    if(Date1 > Date2){
    return(Date1)}else{
    return(Date2)}}), origin = "1970-01-01")
Pceam
  • 46
  • 5
0

Thanks eastclintwood and Pceam.I combined the logic and added one part of mine. It gave me the required result.

ppp <- filter(new_df,Date2 > Date1)

ere <- ppp %>% group_by(Pnp,Frequency,Duration,Date1) %>% mutate(new_Date_11 = first(Date2))

Thanks again.