-1

I need help in taking an annual total (for each of many initiatives) and breaking that down to each month using a simple division formula. I need to do this for each distinct combination of a few columns while copying down the columns that are broken from annual to each monthly total. The loop will apply the formula to two columns and loop through each distinct group in a vector. I tried to explain in an example below as it's somewhat complex.

What I have :

| Init | Name | Date |Total Savings|Total Costs| 
|  A   | John | 2015 |    TotalD   |   TotalD  |
|  A   | Mike | 2015 |    TotalE   |   TotalE  |
|  A   |  Rob | 2015 |    TotalF   |   TotalF  |
|  B   | John | 2015 |    TotalG   |   TotalG  |
|  B   | Mike | 2015 |    TotalH   |   TotalH  |
......
| Init | Name | Date |Total Savings|Total Costs| 
|  A   | John | 2016 |    TotalI   |   TotalI  |
|  A   | Mike | 2016 |    TotalJ   |   TotalJ  |
|  A   |  Rob | 2016 |    TotalK   |   TotalK  |
|  B   | John | 2016 |    TotalL   |   TotalL  |
|  B   | Mike | 2016 |    TotalM   |   TotalM  |

I'm going to loop a function for the first row to take the "Total Savings" and "Total Costs" and divide by 12 where Date = 2015 and 9 where Date = 2016 (YTD to Sept) and create an individual row for each. I'm essentially breaking out an annual total in a row and creating a row for each month of the year. I need help in running that loop to copy also columns "Init", "Name", until "Init", "Name" combination are not distinct. Also, note the formula for the division based on the year will be different as well. I suppose I could separate the datasets for 2015 and 2016 and use two different functions and merge if that would be easier. Below should be the output:

| Init | Name | Date       |Monthly Savings|Monthly Costs| 
|  A   | John | 01-01-2015 |   TotalD/12*  |   MonthD    |
|  A   | John | 02-01-2015 |    MonthD     |   MonthD    |
|  A   | John | 03-01-2015 |    MonthD     |   MonthD    |
...
|  A   | Mike | 01-01-2016 |    TotalE/9*  |   TotalE    |
|  A   | Mike | 02-01-2016 |    TotalE     |   TotalE    |
|  A   | Mike | 03-01-2016 |    TotalE     |   TotalE    |
...
|  B   | John | 01-01-2015 |   TotalG/12*  |   MonthD    |
|  B   | John | 02-01-2015 |    MonthG     |   MonthD    |
|  B   | John | 03-01-2015 |    MonthG     |   MonthD    |

TotalD/12* = MonthD - this is the formula for 2015
TotalE/9* = MonthE - this is the formula for 2016

Any help would be appreciated...

lmo
  • 37,904
  • 9
  • 56
  • 69
CGermain
  • 369
  • 7
  • 18

1 Answers1

0

As a start, here are some reproducible data, with the columns described:

myData <-
  data.frame(
    Init = rep(LETTERS[1:3], each = 4)
    , Name = rep(c("John", "Mike"), each = 2)
    , Date = 2015:2016
    , Savings = (1:12)*1200
    , Cost = (1:12)*2400
  )

Next, set the divisor to be used for each year:

toDivide <-
  c("2015" = 12, "2016" = 9)

Then, I am using the magrittr pipe as I split the data up into single rows, then looping through them with lapply to expand each row into the appropriate number of rows (9 or 12) with the savings and costs divided by the number of months. Finally, dplyr's bind_rows stitches the rows back together.

myData %>%
  split(1:nrow(.)) %>%
  lapply(function(x){
    temp <- data.frame(
      Init = x$Init
      , Name = x$Name
      , Date = as.Date(paste(x$Date
                           , formatC(1:toDivide[as.character(x$Date)]
                                     , width = 2, flag = "0")
                           , "01"
                           , sep = "-"))
      , Savings = x$Savings / toDivide[as.character(x$Date)]
      , Cost = x$Cost / toDivide[as.character(x$Date)]
    )
  }) %>%
  bind_rows()

The head of this looks like:

    Init Name       Date  Savings      Cost
1      A John 2015-01-01 100.0000  200.0000
2      A John 2015-02-01 100.0000  200.0000
3      A John 2015-03-01 100.0000  200.0000
4      A John 2015-04-01 100.0000  200.0000
5      A John 2015-05-01 100.0000  200.0000
6      A John 2015-06-01 100.0000  200.0000

with similar entries for each expanded row.

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks @Mark. Unfortunately the example simplified the view of my actual data frame. Since I have over 40 distinct "init" values and differing "name" values (not always repeating the pattern shown), I'm hoping to restart the loop each time there is a distinct "Init" == "Name" combination. – CGermain Nov 01 '16 at 19:33
  • What do you mean by "restart the loop"? Are you trying to break up each row (an annual summary) into 9/12 rows (monthly breakdown) or is there an additional piece that I am missing? This splits each row, regardless of the init/name values. – Mark Peterson Nov 01 '16 at 19:57
  • I'm trying to break up the annual summary to 9/12 rows for each annual summary where "Init" and "Name" are the same combination as these change into various combinations. As I mentioned there are about 40 distinct "Init" values (these are project initiative names) and 10 or so "Name" values (these are dept names). Each "Init" will have a few "Name" (depts) reporting savings, so I need to break out to monthly savings and costs for all distinct initiative and dept combinations while preserving those in each monthly savings/costs row. Does this make more sense? – CGermain Nov 01 '16 at 21:56
  • If there is one row for each annual summary per init/name pair in the data, what I have here should work. If not, I am missing something in my understanding. – Mark Peterson Nov 01 '16 at 22:12