3

my dataframe looks someway like this

df <- read.table(text="
                 id          start           end
    1            2      2018-10-01    2018-12-01
    2            3      2018-01-01    2018-04-01
", header=TRUE)

What I trying to achieve is get difference between start and day in months for each id and then generate new dataframe with each month for particular id. Result should be

result <- read.table(text="
                 id          date           
    1            2      2018-10-01
    2            2      2018-11-01
    3            2      2018-12-01     
    4            3      2018-01-01
    5            3      2018-02-01    
    6            3      2018-03-01    
    7            3      2018-04-01        
", header=TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Sklenicka
  • 595
  • 2
  • 4
  • 16
  • Related: [Expand ranges defined by “from” and “to” columns](https://stackoverflow.com/questions/11494511/expand-ranges-defined-by-from-and-to-columns), but it doesn't cover the `by = month` aspect in `seq`. – Henrik Jan 02 '19 at 16:30

2 Answers2

3

Most straightforward way using base R functions is to create a sequence of monthly dates for each row, create a dataframe and rbind them together

do.call(rbind, with(df,lapply(1:nrow(df), function(i) 
   data.frame(id = id[i], date = seq(as.Date(start[i]), as.Date(end[i]), by = "month")))))

#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

We can do this easily with Map. Pass the Date converted 'start' and 'end' columnd of the dataset as arguments to Map, get the sequence of 'month's as a list and expand the 'id' based on the lengths of list as well as concatenate the list elements to create the expanded dataframe

lst1 <- Map(seq, MoreArgs = list(by = 'month'), as.Date(df$start), as.Date(df$end))
data.frame(id = rep(df$id, lengths(lst1)), date = do.call(c, lst1))
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

Or using tidyverse, we mutate the class of the 'start', 'end' columns to Date, using map2 (from purrr), get the sequence of dates from 'start' to 'end' by the 'month' and expand the data by unnesting the dataset

library(tidyverse)
df %>% 
  mutate_at(2:3, as.Date) %>% 
  transmute(id = id, date = map2(start, end,  ~ seq(.x, .y, by = 'month'))) %>% 
  unnest
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01
akrun
  • 874,273
  • 37
  • 540
  • 662