3

Here is what my current dataframe looks like:

df <- data.frame(name = c("A", "A", "A", "B", "B")), 
  start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23")),
  end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13")), 
  value = c(8.1, 3.3, 9.1, 9.4, 15.3)))



 name start_date   end_date value
    A 2020-01-23 2020-05-15     8
    A 2019-10-15 2020-01-27     3
    A 2019-07-28 2019-10-17     9
    B 2020-03-15 2020-07-25     9
    B 2019-04-23 2020-02-13    15

The dates are in POSIXct, are not necessarily consecutive, and can overlap.

I would like my output dataframe to look something like this:

name date  value
A 2020-01-23 8.1
A 2020-01-24 8.1
A ...        8.1
A 2020-05-14 8.1
A 2020-05-15 8.1
A 2019-10-15 3.3
A 2019-10-16 3.3
A ...        3.3
A 2020-01-26 3.3
A 2020-01-27 3.3
A 2019-07-28 9.1
A 2019-07-29 9.1
A ...        9.1
A 2019-10-16 9.1
A 2019-10-17 9.1
B 2020-03-15 9.4
B 2020-03-16 9.4
B ...        9.4
B 2020-07-24 9.4
B 2020-07-25 9.4
B 2019-04-23 15.3
B 2019-04-24 15.3
B ...        15.3
B 2020-02-12 15.3
B 2020-02-13 15.3

Here is what I have been trying:

 library(data.table)
 setDT(df) [, .(date = seq(as.Date(start_date), as.Date(end_date), by = "day")), by = end_date]

But I have been getting the following error:

Error in seq.Date(as.Date(start_date), as.Date(end_date), by = "day") : 
  'from' must be of length 1

How should I do this? I am open to using other packages rather than data.table if they work better.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • [Expand ranges defined by “from” and “to” columns](https://stackoverflow.com/questions/11494511/expand-ranges-defined-by-from-and-to-columns) – Henrik Feb 28 '21 at 22:32

2 Answers2

2

Here, we may need to use by as sequence of rows

library(data.table)
setDT(df)[, .(date = seq(as.Date(start_date), as.Date(end_date),
  by = 'day')), .(rn = seq_len(nrow(df)), name, value)][, rn := NULL][]

Or create a list column by looping over corresponding elements of 'start_date', 'end_date' to create a sequence of dates in Map and then unnest the list

library(tidyr)
library(magrittr)
setDT(df)[, .(name, date = Map(seq, MoreArgs = list(by = '1 day'), 
      as.Date(start_date), as.Date(end_date)), value)] %>% 
   unnest(date)
# A tibble: 731 x 3
#   name  date       value
#   <chr> <date>     <dbl>
# 1 A     2020-01-23   8.1
# 2 A     2020-01-24   8.1
# 3 A     2020-01-25   8.1
# 4 A     2020-01-26   8.1
# 5 A     2020-01-27   8.1
# 6 A     2020-01-28   8.1
# 7 A     2020-01-29   8.1
# 8 A     2020-01-30   8.1
# 9 A     2020-01-31   8.1
#10 A     2020-02-01   8.1
# … with 721 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Another approach using purrr

df <- data.frame(name = c("A", "A", "A", "B", "B"), 
  start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23"),
  end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13"), 
  value = c(8.1, 3.3, 9.1, 9.4, 15.3))

library(dplyr)
library(purrr)

# function take in the name, start, end, value and generate a df fill as wanted
generate_fill <- function(name, start, end, value) {
  tibble(name = name, 
    date = seq(as.Date(start), as.Date(end), by = "1 day"), 
    value = value)
}

# Map the function to original df and combine the result 
bind_rows(
  pmap(list(df[["name"]], df[["start_date"]], df[["end_date"]], df[["value"]]),
  generate_fill))

Output

# A tibble: 731 x 3
   name  date       value
   <chr> <date>     <dbl>
 1 A     2020-01-23   8.1
 2 A     2020-01-24   8.1
 3 A     2020-01-25   8.1
 4 A     2020-01-26   8.1
 5 A     2020-01-27   8.1
 6 A     2020-01-28   8.1
 7 A     2020-01-29   8.1
 8 A     2020-01-30   8.1
 9 A     2020-01-31   8.1
10 A     2020-02-01   8.1
# … with 721 more rows
Sinh Nguyen
  • 4,277
  • 3
  • 18
  • 26