0

I would like to compute a Cumulative Sum according to Center Disease Control definition. I have weekly data from 25 District since 2008 to now.

          Site1   Site2    Site3  Site4  Site5   Site6   Site7   Site8   Site9
2011-45 1   1   1   1   2   2   1   1   1
2011-46 1   1   1   1   1   2   1   2   3
2011-47 2   1   1   1   1   3   1   2   1
2011-48 1   1   1   1   2   2   3   2   1
2011-49 1   1   1   1   1   2   1   3   2
2011-50 1   1   1   1   1   4   1   1   1
2011-51 1   1   1   1   1   1   1   1   1
2011-52 1   1   1   1   1   2   2   2   2
2012-01 1   1   1   3   2   4   1   1   1
2012-02 1   1   2   1   1   2   1   3   3
2012-03 1   2   1   1   1   2   1   1   2
2012-04 3   1   2   1   1   3   1   2   1
2012-05 1   1   2   1   1   2   1   2   2
2012-06 3   1   2   1   1   5   1   2   2
2012-07 1   1   1   1   1   2   NA  2   1
2012-08 1   2   6   1   1   4   1   4   2
2012-09 3   1   2   1   1   1   1   4   2
…   …   …   …   …   …   …   …   …   …
…   …   …   …   …   …   …   …   …   …
…   …   …   …   …   …   …   …   …   …
2012-45 7   2   1   NA  1   22  NA  17  1
2012-46 4   1   NA  1   1   13  NA  7   1
2012-47 2   1   1   1   1   39  1   12  1
2012-48 4   1   2   NA  1   24  1   8   1
2012-49 1   1   1   1   1   19  1   5   1
2012-50 2   1   2   1   2   17  2   1   2
2012-51 7   1   1   1   1   21  1   3   1
2012-52 1   1   2   1   1   5   1   11  1
2012-53 NA  NA  NA  NA  NA  NA  1   NA  NA
2013-00 6   1   1   1   1   11  1   6   1
2013-01 7   2   3   1   1   58  1   18  2
2013-02 3   2   2   1   3   53  1   27  2
2013-03 1   2   2   1   1   21  1   9   1
2013-04 2   1   1   1   1   14  1   6   3
2013-05 3   1   1   1   1   10  1   3   1
2013-06 NA  2   3   1   1   16  1   8   1
2013-07 16  1   4   1   1   22  1   8   3
2013-08 10  2   6   1   1   19  1   11  1
2013-09 13  1   3   1   2   31  1   7   1
…   …   …   …   …   …   …   …   …   …
…   …   …   …   …   …   …   …   …   …
…   …   …   …   …   …   …   …   …   …
2013-45 2   1   1   1   1   15  NA  6   1
2013-46 2   2   1   1   1   5   1   2   1
2013-47 3   1   1   1   2   12  1   2   1
2013-48 3   1   1   1   1   2   2   1   1
2013-49 1   2   1   1   1   7   1   NA  2
2013-50 2   2   1   1   1   1   1   1   1
2013-51 NA  1   1   1   1   8   1   2   1
2013-52 NA  2   1   1   1   4   1   2   1
2014-00 NA  1   1   1   1   4   1   1   1
2014-01 NA  2   3   1   1   3   1   1   1
2014-02 NA  3   1   1   2   6   1   1   2
2014-03 NA  1   1   1   1   3   1   1   2
2014-04 NA  1   1   1   1   3   1   1   2
2014-05 1   2   1   1   1   1   1   1   1
2014-06 1   2   1   1   1   1   1   1   1
2014-07 1   2   1   1   1   1   1   1   1
2014-08 1   2   1   1   1   1   1   1   1
2014-09 1   2   1   1   1   1   1   1   1

To compute a Cum-sum; I would like to sum :

[(2014-07 + 2014-08 + 2014-09)+(2013-07 + 2013-08 + 2013-09)+...for X year] /weeks number                                 
                                        +

[(2014-08 + 2014-09 + 2014-10)+(2013-08 + 2013-09 + 2013-10)+...for X year] /week number                                  
                                        +

[(2014-09 + 2014-10 + 2014-11)+(2013-09 + 2013-10 + 2013-11)+...for X year] /weeks number 

                                        +

[(2014-10 + 2014-11 + 2014-12)+(2013-10 + 2013-11 + 2013-12)+...for X year] /weeks number

Thanks!

smci
  • 32,567
  • 20
  • 113
  • 146
user3355655
  • 463
  • 3
  • 15
  • Use `dput` to dump us a reproducible example. Truncate to a sane length if necessary. It all looks ok to me. From your dput we will be able to see how you're representing date (as string? factor? POSIXct?POSIXlt? something else?) Since you know you need to perform <, > comparisons on dates, pick a suitable type. – smci Apr 29 '14 at 05:35
  • ^ Since we pretty much know you're going to need a data.frame by the end, please convert to data.frame and dump that, not multiple separate xts series. – smci Apr 29 '14 at 05:45

1 Answers1

0
  • You want to perform a cumsum for selected months across multiple years. We don't need to know the gory details of where the data was culled from. Just give us a dataframe.

  • First you need to separate out Year, Month into new separate columns (keep Date too, if you like). You can either use substr() since you have a fixed date-string format, or else use a function to compute month from your date-type (just search SO). That's up to you.

  • The strsplit approach is:

    df$Year <- df$Month <- NA // for pessimism, in case we had some missing months in string df$Year <- substr(df$Date,1,4) df$Week <- substr(df$Date,6,7) // You can leave Year, Week as integer, or convert to factor, whatever makes your subsequent work tabulating and plotting easier. If you want Month represented as '00','01', either use factor with string levels, or string. Or use format/printf whenever you print it.

  • (PS your data has weeknumbers 00..53, seems you have partial weeks at begin/end of year. Double-check this entry, maybe you want to drop segment 2012-53:

    2012-53 NA NA NA NA NA NA 1 NA NA 2013-00 6 1 1 1 1 11 1 6 1

  • Then use logical indexing, subset(), dplyr::filter (or plyr::ddply) the months-of-interest. Your logical expression will be Month %in% c(7,8,9) or else Month>=7 & Month <=9

  • When you say 'for X year' did you mean 'over all years' or 'for some parameterizable subset of years'? Please edit to clarify.

  • An expression like Month>=7 & Month <=9 will require you to use a date type which supports numerical comparison, not just dumb '=='. (I recommend that anyway since plotting behaves much nicer when it understands the datetime type customarily on the x-axis)

  • Then do columnwise cumsum. Either dplyr::summarize, or lapply, or colwise(cumsum)(col-of-interest)

  • I presume you want all the cumsums in a new row 'Cumsum' at the bottom of this dataframe? Or if it isn't a dataframe, make your life easy and make it a dataframe so you can add a bottom row with a rowname 'Cumsum'.
  • Done. I recommend dplyr (the most active-development and most performant library right now, your code will be beautifully extensible should you progress to harder stuff), but it might be overkill for this fairly simple task, it's up to you.
smci
  • 32,567
  • 20
  • 113
  • 146