0

I have this data that is split by week and would like to change this into monthly and/or quarterly data. Is there a condition that I can use that will sum column data based on dates or weeks? I am unsure how to do it based on weeks given I have data from 1962-2016 and do not want to combine all weeks. Below is how my data frame is set up, any suggestions greatly appreciated.

enter image description here

Okay so this is filtered by week, and I have years with 52 or 53 weeks. The suggested solution does seem to fix these even though some dates are the ending dates and some fall in the previous year. If I can create a new column by Year and Month and sum column values in columns 7-13 (by column not row)

enter image description here

ct957
  • 37
  • 1
  • 2
  • 8
  • Could you show what your output would look like for let's say January and February 1962 for the variable pideaths for example? – Frostic Apr 03 '18 at 19:34
  • So lets say quarterly it would look something like 1962 1 1962-01-06 1 MA Boston (sum of pideaths week 1-13) etc. and then again for 1962 2 (for quarter) and so on – ct957 Apr 03 '18 at 19:36
  • I think I still need an example to see how you deal with overlapping weeks. – Frostic Apr 03 '18 at 19:39
  • Yea sorry it added comment before I was done. I can make a spreadsheet if needed to better show what I am trying to do – ct957 Apr 03 '18 at 19:40
  • if you provide output of `dput(data,10)` then we may be able to help you. And answer @MaxFt question. – M-- Apr 03 '18 at 19:48
  • 1
    Let's say you aggregate monthly. Consider the 5th line of your dataset : year 1962 week 5 end date 1962-02-03 (begin date 1962-01-28). Does this line counts for January or February? A proportion for both? – Frostic Apr 03 '18 at 20:11
  • The dates are the end dates so yes youre correct. – ct957 Apr 03 '18 at 20:20

2 Answers2

1
library(dplyr)
# First create new variable for just the month and year
df$Month_Yr <- format(as.Date(df$Date), "%Y-%m")
# Then group by (dplyr) the months and sum over intended variable
df %>% group_by(Month_Yr) %>% summarise(sum_pideaths = sum(pideaths, na.rm = TRUE))

Note: this will only sum the column pideaths. If you want to sum all of the columns, first select the numeric columns and then use summarise_all():

df %>% select(Month_Yr,**numeric_cols**) %>% group_by(Month_Yr) %>% summarise_all(funs(sum = sum(na.rm = TRUE)))
conv3d
  • 2,668
  • 6
  • 25
  • 45
  • I have created the new variable by month and year but the next line I am confused by since it does not sum. Does having characters and date in the other columns affect this function? – ct957 Apr 03 '18 at 19:48
  • 1
    This worked great thank you! I also used the code below to make the data quarterly and repeated the process to sum data. df$Yr_Qt <- as.yearqtr(df$Month_Yr, format = "%Y-%m") – ct957 Apr 03 '18 at 23:17
0

The as.POSIXlt function delivers a list that can be extracted to give numeric month values that are C-like in that they begin with 0 for January. That should not be a problem for splitting. See the Details section for options, but I can tell you there is not a quarterly option.

Here is the code for delivering weeks:

 as.POSIXlt( Sys.Date()+1:60 )$yday %/% 7
 [1] 13 13 13 13 13 14 14 14 14 14 14 14 15 15 15 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17
[31] 17 17 17 18 18 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 21 21 21 21 21 21

But do notice that this will not necessarily be aligned with your choice of the weeks' beginning. You might need to subtract the numeric weekday of the first of the year to get it to align properly. (Sunday is the 0 weekday.)

as.POSIXlt( Sys.Date()+1:60 - as.POSIXlt( as.Date( paste0( format(Sys.Date(), "%Y"),"-01-01")))$wday )$yday %/% 7
 [1] 13 13 13 13 13 13 14 14 14 14 14 14 14 15 15 15 15 15 15 15 16 16 16 16 16 16 16 17 17 17
[31] 17 17 17 17 18 18 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 21 21 21 21 21

Another option that delivers a printable label, zoo::as.yearmon.

as.POSIXlt( Sys.Date()+1:60 )$mon
 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[46] 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5


zoo::as.yearmon( Sys.Date()+1:60 )
 [1] "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018"
 [9] "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018"
[17] "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018" "Apr 2018"
[25] "Apr 2018" "Apr 2018" "Apr 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018"
[33] "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018"
[41] "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018"
[49] "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018" "May 2018"
[57] "May 2018" "May 2018" "Jun 2018" "Jun 2018"

The zoo package also has an as.yearqtr function:

 zoo::as.yearqtr( Sys.Date()+seq(0, 180, by=30) )
#[1] "2018 Q2" "2018 Q2" "2018 Q2" "2018 Q3" "2018 Q3" "2018 Q3" "2018 Q3"

"Under the hood" (or "bonnet" as the case may be) this function is actually delivering numeric values as well but with a class that has a special print method:

 unclass( zoo::as.yearqtr( Sys.Date()+seq(0, 180, by=30) ) )
[1] 2018.25 2018.25 2018.25 2018.50 2018.50 2018.50 2018.50
IRTFM
  • 258,963
  • 21
  • 364
  • 487