1

I am trying to create a dataset which is based on the difference in the days of start & end date. as an example

Name  Start_Date  End_Date
Alice 1-1-2017    3-1-2017
John  4-3-2017    5-3-2017
Peter 12-3-2017   12-3-2017

So, the final dataset will be inclusive of the start, end date and also the difference. And eventually it should look something like

Name  Date
Alice 1-1-2017
Alice 2-1-2017
Alice 3-1-2017
John  4-3-2017
John  5-3-2017
Peter 12-3-2017

Every help is Great Help. Thanks !

Manu Sharma
  • 1,593
  • 4
  • 25
  • 48
  • This answer should be helpful: http://stackoverflow.com/questions/3765668/have-lubridate-subtraction-return-only-a-numeric-value – Andrew Brēza Apr 05 '17 at 12:17

3 Answers3

0

We can use Map to get the sequence and melt the list to data.frame`

df1[-1] <- lapply(df1[-1], as.Date, format = "%d-%m-%Y")
lst <- setNames(Map(function(x, y) seq(x, y, by = "1 day"), 
             df1$Start_Date, df1$End_Date), df1$Name)
library(reshape2)
melt(lst)[2:1]

data

df1 <- structure(list(Name = c("Alice", "John", "Peter"), Start_Date = structure(c(17167, 
 17229, 17237), class = "Date"), End_Date = structure(c(17169, 
 17230, 17237), class = "Date")), .Names = c("Name", "Start_Date", 
 "End_Date"), row.names = c(NA, -3L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
0

This uses the expandRows function from the package splitstackshape:

df = df %>% 
  mutate(days_between = as.numeric(End_Date - Start_Date),
    id = row_number(Name)) %>%
  expandRows("days_between") %>%
  group_by(id) %>%
  mutate(Date = seq(first(Start_Date),
                  first(End_Date) - 1,
                  by = 1)) %>%
  ungroup()
onnhoJ
  • 56
  • 7
0

using a for loop:

library(data.table)
library(foreach)
library(lubridate)

setDT(df)
names = df[, unique(Name)]

l = foreach(i = 1:length(names)) %do% {

  # make a date sequence per name
  d = df[Name == names[i], ]
  s = seq(from = dmy(d$Start_Date), to = dmy(d$End_Date), by = "days")

  # bind the results in a data.table
  dx = data.table(name = rep(names[i], length(s)))
  dx = cbind(dx, date = s)

}

rbindlist(l)
Henk
  • 3,634
  • 5
  • 28
  • 54