2

I have a dataframe with an id, start date, end date and the income and cost values.

table <- data.frame(id = c(1, 2, 3),
               start = c("2018-01-01", "2018-02-05", "2018-05-30"),
               end = c("2018-01-31", "2018-03-26", "2018-08-31"),
               income = c(100, 225, 399),
               costs = c(37, 98, 113))

table$start <- as.Date(table$start)
table$end <- as.Date(table$end)

Which results in:

  id      start        end income costs
  1 2018-01-01 2018-01-31    100    37
  2 2018-02-05 2018-03-26    225    98
  3 2018-05-30 2018-08-31    399   113

Just like this question, some of these time periods span n months and I would like to aggregate income and cost by month. For those amounts that relate to a period which spans two, three or more months I would like to linearly apportion them between the two, three or n months.

The problem is that I also would like to keep the id, and perform the operations over two variables (not one like the question asked before), which complicates the whole thing.

What I'm expecting to get is the following table:

  id   date      income      costs
  1   2018-01    100         37
  2   2018-02    108         47.04
  2   2018-03    117         50.96
  3   2018-05    8.489362    2.404255
  3   2018-06    127.340426  36.063830
  3   2018-07    131.585106  37.265957
  3   2018-08    131.585106  37.265957

I tried using rbindlist over a list of dataframes created by the id, and the following function:

explode <- function(start, end, income) {
              dates <- seq(start, end, "day")
              n <- length(dates)
              rowsum(rep(income, n) / n, format(dates, "%Y-%m"))                  
}

  Map(explode, table$start, table$end, table$income)

But of course it returns only the rowsum values inside and unnamed list.

Any help will be very appreciated. Thanks!

phariza
  • 53
  • 5
  • 1
    I don't understand the income, cost calculation for following months. How do you get income=108 and so on ? – YOLO Dec 08 '18 at 21:42
  • It allocates the proportional part of the original value to february. In other words, first you have to calculate the daily income between start and end dates, and then multiplies it by the number of days of each month. – phariza Dec 08 '18 at 22:46

2 Answers2

1

I'd go for data.table:

library(data.table)

table_aggregated <- setDT(table)[
  , .(id = id, income = income, costs = costs, day_var = seq(start, end, "day")), by = 1:nrow(table)][
    , `:=` (income_day = income / .N, 
            costs_day = costs / .N,
            date = format(day_var, "%Y-%m")), by = id][
              , .(income = sum(income_day),
                  costs = sum(costs_day)), by = .(id, date)]

Output:

   id    date     income     costs
1:  1 2018-01 100.000000 37.000000
2:  2 2018-02 108.000000 47.040000
3:  2 2018-03 117.000000 50.960000
4:  3 2018-05   8.489362  2.404255
5:  3 2018-06 127.340426 36.063830
6:  3 2018-07 131.585106 37.265957
7:  3 2018-08 131.585106 37.265957
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
1

Your solution could have worked. Simply, add a new parameter to Map and extend your function with cbind to combine income and costs, then rbind the list generated from Map:

explode <- function(start, end, income, costs) {
  dates <- seq(start, end, "day")
  n <- length(dates)
  cbind.data.frame(
    date = format(start, "%Y-%m"),
    income = rowsum(rep(income, n) / n, format(dates, "%Y-%m")),
    costs = rowsum(rep(costs, n) / n, format(dates, "%Y-%m")) 
  )
}

data_list <- Map(explode, table$start, table$end, table$income, table$costs)
final_df <- do.call(rbind, data_list)

final_df    
#    date     income     costs
# 2018-01 100.000000 37.000000
# 2018-02 108.000000 47.040000
# 2018-03 117.000000 50.960000
# 2018-05   8.489362  2.404255
# 2018-06 127.340426 36.063830
# 2018-07 131.585106 37.265957
# 2018-08 131.585106 37.265957
Parfait
  • 104,375
  • 17
  • 94
  • 125