generate rows for each id by date sequence

Question

my dataframe looks someway like this

df <- read.table(text="
                 id          start           end
    1            2      2018-10-01    2018-12-01
    2            3      2018-01-01    2018-04-01
", header=TRUE)

What I trying to achieve is get difference between start and day in months for each id and then generate new dataframe with each month for particular id. Result should be

result <- read.table(text="
                 id          date           
    1            2      2018-10-01
    2            2      2018-11-01
    3            2      2018-12-01     
    4            3      2018-01-01
    5            3      2018-02-01    
    6            3      2018-03-01    
    7            3      2018-04-01        
", header=TRUE)

Related: [Expand ranges defined by “from” and “to” columns](https://stackoverflow.com/questions/11494511/expand-ranges-defined-by-from-and-to-columns), but it doesn't cover the `by = month` aspect in `seq`. — Henrik, Jan 02 '19 at 16:30

score 3 · Answer 1 · answered Jan 02 '19 at 16:21

Most straightforward way using base R functions is to create a sequence of monthly dates for each row, create a dataframe and rbind them together

do.call(rbind, with(df,lapply(1:nrow(df), function(i) 
   data.frame(id = id[i], date = seq(as.Date(start[i]), as.Date(end[i]), by = "month")))))

#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

akrun · Answer 2 · 2019-01-02T16:33:34.763

3

We can do this easily with Map. Pass the Date converted 'start' and 'end' columnd of the dataset as arguments to Map, get the sequence of 'month's as a list and expand the 'id' based on the lengths of list as well as concatenate the list elements to create the expanded dataframe

lst1 <- Map(seq, MoreArgs = list(by = 'month'), as.Date(df$start), as.Date(df$end))
data.frame(id = rep(df$id, lengths(lst1)), date = do.call(c, lst1))
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

Or using tidyverse, we mutate the class of the 'start', 'end' columns to Date, using map2 (from purrr), get the sequence of dates from 'start' to 'end' by the 'month' and expand the data by unnesting the dataset

library(tidyverse)
df %>% 
  mutate_at(2:3, as.Date) %>% 
  transmute(id = id, date = map2(start, end,  ~ seq(.x, .y, by = 'month'))) %>% 
  unnest
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

edited Jan 02 '19 at 16:33

answered Jan 02 '19 at 16:25

akrun

874,273
37
540
662

I can't reproduce the error with `purrr 0.2.5` and `dplyr 0.7.6` – akrun Jan 02 '19 at 16:35
Try it on a fresh session in case there are some function masking – akrun Jan 02 '19 at 16:45
Not sure what was wrong. Worked on Mac now but I had issue yesterday on Windows. – Ronak Shah Jan 03 '19 at 03:50
I am also using Mac. So, it could be some issue in windows – akrun Jan 03 '19 at 04:55

generate rows for each id by date sequence

2 Answers2