1

I would like to create a function that returns a vector - the vector needs to contain the sum of values in another vector (monthly_amount), but only sums the values in 'monthly_amount' for the corresponding 'year_month' (a third vector) to the index at which the function is returning its value.

I promise this will make (a little) more sense after you see the data...

Here is my data:

I have a dataframe that houses 89 nested dataframes (stored in the third column using tidyr) and it looks like this:

 head(df)

    # A tibble: 6 x 5
  industry location data                
     <dbl>    <dbl> <list>              
1       8.       9. <tibble [627 × 5]>  
2       1.       4. <tibble [5,879 × 5]>
3       9.       2. <tibble [1,271 × 5]>
4       1.       2. <tibble [8,939 × 5]>
5       1.       5. <tibble [6,093 × 5]>
6      10.       5. <tibble [315 × 5]> 

The contents of the third column ('data') are structured identically to one another, they look like this:

> df$data[[1]]

    # A tibble: 627 x 5
   date       monthly_amount year  month year_month
   <date>              <dbl> <chr> <chr> <chr>     
 1 2013-01-01        753851. 2013  01    2013_01   
 2 2013-01-01        154426. 2013  01    2013_01   
 3 2013-01-01        499604. 2013  01    2013_01   
 4 2013-01-01        379321. 2013  01    2013_01   
 5 2013-01-01        213702. 2013  01    2013_01   
 6 2013-01-01        274118. 2013  01    2013_01   
 7 2013-01-01        282391. 2013  01    2013_01   
 8 2013-01-01        236070. 2013  01    2013_01   
 9 2013-01-01        182512. 2013  01    2013_01   
10 2013-01-01        428778. 2013  01    2013_01   

Here is a dummy example of what i'm trying to achieve:

# A tibble: 627 x 5
   date       monthly_amount year  month year_month  desired_outcome
   <date>              <dbl> <chr> <chr> <chr>                <dbl> 
 1 2013-01-01        753851. 2013  01    2013_01 "sum of all monthly_amount     
 2 2013-01-01        154426. 2013  01    2013_01 where year_month==2013_01"
 3 2013-01-02        499604. 2013  02    2013_02 "sum of all monthly_amount  
 4 2013-01-02        379321. 2013  02    2013_02 where year_month==2013_02"  
 5 2013-01-02        213702. 2013  02    2013_02   
 6 2013-01-03        274118. 2013  03    2013_03   etc...
 7 2013-01-03        282391. 2013  03    2013_03   
 8 2013-01-04        236070. 2013  04    2013_04   
 9 2013-01-04        182512. 2013  04    2013_04   
10 2013-01-04        428778. 2013  04    2013_04 

Ideally I will map this function to all 89 dataframes so it needs to be quite general rather than a more precisely worded for-loop.

Any advice on how I might better phrase this question (or approach this problem!) is greatly appreciated.

Davide Lorino
  • 875
  • 1
  • 9
  • 27

1 Answers1

1

We can loop through the data with map, then grouped by 'year_month', create the 'desired_outcome' as the sum of 'monthly_amount'

df %>%
    mutate(data = map(data, ~ .x %>%
                               group_by(year_month) %>%
                               mutate(desired_outcome = sum( monthly_amount))))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This looks great! I'm having trouble with the closing parenthesis for the map function, clearly im no expert but im not sure where to close that parenthesis. – Davide Lorino May 18 '18 at 17:15
  • @DavideLorino |thanks for the comment. I forgot to close the parenthese. Try now\ – akrun May 18 '18 at 17:17
  • 1
    This is an amazingly elegant solution - if I had the requisite experience to be 'up-voting' your answer I would do it 100%! – Davide Lorino May 18 '18 at 17:22