0

I am trying to sum a variable on a data.frame for every Friday.

Random data frame

mydf = data.frame(      "ID"   = c( rep( "A" , 6) , rep( "B" , 5 ) ),   "Date" = c( "2017-09-08","2017-09-10","2017-09-13","2017-09-15","2017-09-20","2017-09-22","2017-08-03","2017-08-04","2017-08-10","2017-08-11","2017-08-12" , "Var"  = c( 1,2,3,4,5,6,7,8,NA,10,11) )

mydf$Date = as.Date( mydf$Date )

mydf = cbind( mydf , "WeekDay" = weekdays( mydf$Date ) )

What I want to get

df_ToGet = 
data.frame( 
    "ID"   = c( rep( "A" , 3) , rep( "B" , 2 ) ),
    "Date" = c( "2017-09-08","2017-09-15","2017-09-22","2017-08-04","2017-08-11"  ),
    "Var_Sum"  = c( 1 , 9 , 11 , 15, 10 )
    )

What I tried

I have considered using dplyr::summarize and aggregate but I do not know how to set the by condition properly.

mydf %>% group_by( ID ) %>% summarize( Var_Sum = aggregate( Var , sum ,  by=list ( (mydf$Weekday)=="Friday") )  )

I have seen a few similar questions being solved using the cut function but that seems to be setting the condition to a standard week? I'm not too familiar with it yet.

mathnoob
  • 73
  • 1
  • 1
  • 9

1 Answers1

2

We need to create a grouping variable using cumsum

mydf %>%
    slice(seq_len(tail(which(WeekDay== "Friday"), 1))) %>% 
    group_by(ID, grp = lag(cumsum(WeekDay == "Friday"), default = 0)) %>% 
    summarise(Date = Date[WeekDay == "Friday"], Var = sum(Var, na.rm = TRUE)) %>%
    ungroup() %>%
    select(-grp)
# A tibble: 5 x 3
#     ID       Date   Var
#   <fctr>     <date> <dbl>
#1      A 2017-09-08     1
#2      A 2017-09-15     9
#3      A 2017-09-22    11
#4      B 2017-08-04    15
#5      B 2017-08-11    10
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Looks like I was correct to ask this instead of figuring it out myself. Thanks. Is it possible to improve this to allow for multiple variables though? For example having 100 variables to summarize. If this requires asking a new question, let me know and I'll do that instead. – mathnoob Oct 21 '17 at 05:25
  • 2
    To summarize more variables use `summarize_all` or `summarize_each` instead of `summarize`. – danh Oct 21 '17 at 05:38
  • 1
    Okay, got it. Cheers @danh – mathnoob Oct 21 '17 at 06:41