1

I have a set of data where I have annual totals for specific values stored in one row (observation). I'd like to tidy the data in R so that this total row is broken out for each month using a simple equation (total/12) that stores the annual total divided by 12 in each of 12 rows as a monthly total. I'm trying to do this in R but am very beginner and not quite sure where to start. Example is below:

Date | Total
2015 | 12,000 

Some R function to convert to:

   Date    | Total
01-01-2015 | 1,000
02-01-2015 | 1,000
03-01-2015 | 1,000
CGermain
  • 369
  • 7
  • 18

1 Answers1

0

Here is an example dataset with multiple years, which I presume is your desired use case:

myData <-
  data.frame(
    Date = 2011:2015
    , Total = (1:5)*1200
  )

Then, I split the data.frame up by the Date column (this assumes the years are unique), looped through with lapply to modify to the monthly values you wanted, then stitched them back together with bind_rows from dplyr (also need to load dplyr or magrittr for the pipes (%>%) to work).

myData %>%
  split(.$Date) %>%
  lapply(function(x){
    temp <- data.frame(
      Date = as.Date(paste(x$Date
                           , formatC(1:12, width = 2, flag = "0")
                           , "01"
                           , sep = "-"))
      , Total = x$Total / 12
    )
  }) %>%
  bind_rows()

Gives (note, using ... to indicate skipped rows)

         Date Total
1  2011-01-01   100
2  2011-02-01   100
3  2011-03-01   100
...
16 2012-04-01   200
17 2012-05-01   200
18 2012-06-01   200
...
29 2013-05-01   300
30 2013-06-01   300
31 2013-07-01   300
...
43 2014-07-01   400
44 2014-08-01   400
45 2014-09-01   400
...
58 2015-10-01   500
59 2015-11-01   500
60 2015-12-01   500

If, as suggested in the comments, you need to divide by a different number of months each year, I would create a vector of the divisors named for the years. Here, I am using dplyr for n_distinct and the magrittr pipe to ease naming of the common case. Then, add/change the entry for the different year:

toDivide <-
  rep(12, n_distinct(myData$Date)) %>%
  setNames(sort(unique(myData$Date)))

toDivide["2016"] <- 9

Then, instead of 12 as above, you can use this vector. So, replace this:

Total = x$Total / 12

within the lapply loop, with this:

Total = x$Total / toDivide[as.character(x$Date)]

and it should divide by the entry for that year, instead of always dividing by the same thing.

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thank you very much Mark! I'll give this a try and let you know how it works out. – CGermain Oct 31 '16 at 18:46
  • Mark, this worked well. I added a group by clause as I had multiple rows all with totals that needed to be grouped by additional dimensions. I did have an additional questions. I have 2015 which I need to divide by 12 and then 2016 which I need to divide by 9 (through sept.) where would I put an "if" statement to say if Date = 2016 then divide by 9? – CGermain Oct 31 '16 at 19:27
  • Glad it worked for you. See edit for an example that should work for what you need. – Mark Peterson Nov 01 '16 at 11:39
  • Thanks @Mark Peterson. So I suppose what I'm trying to do is more complicated than I originally thought. I'm also trying to copy the data in the remaining columns of the dataset (there are many more rows and columns where each row has the total and year). Since the above function breaks out a single row to 12(2015)+9(2016) total rows, how would I preserve the other columns data? Also, i need to loop the formula for each time it reaches a non-distinct set of three columns. I'll add an edit above to show what I mean. – CGermain Nov 01 '16 at 16:49
  • I think you are approaching new question territory. You may have better success using this as a starting point for a new question that incorporates the added complexity. – Mark Peterson Nov 01 '16 at 16:52
  • If this addressed the question asked, you may want to mark the answer as accepted. For others, the follow up is [here](http://stackoverflow.com/questions/40365384/copy-column-data-when-function-unaggregates-a-single-row-into-multiple-in-r/) – Mark Peterson Nov 01 '16 at 18:25