2

Here is a small example of the kind of data I have:

transactions <- tibble(id = seq(1:7),
                       day = paste(rep("day", each = 7), seq(1:7), sep = ""),
                       sent_to = c(NA, "Garden Cinema", "Pasta House", NA, "Blue Superstore", "Jane", "Joe"),
                       received_from = c("ATM", NA, NA, "Sarah", NA, NA, NA),
                       reference = c("add_cash", "cinema_tickets", "meal", "gift", "shopping", "reimbursed", "reimbursed"),
                       decrease = c(NA, 10.8, 12.5, NA, 15.25, NA, NA),
                       increase = c(50, NA, NA, 30, NA, 5.40, 7.25))

# # A tibble: 7 × 7
#      id   day   sent_to         received_from reference      decrease  increase
#    <int>  <chr> <chr>           <chr>         <chr>          <dbl>     <dbl>   
# 1     1   day1  NA              ATM           add_cash       NA        50      
# 2     2   day2  Garden Cinema   NA            cinema_tickets 10.8      NA      
# 3     3   day3  Pasta House     NA            meal           12.5      NA      
# 4     4   day4  NA              Sarah         gift           NA        30      
# 5     5   day5  Blue Superstore NA            shopping       15.2      NA      
# 6     6   day6  Jane            NA            reimbursed     NA        5.4     
# 7     7   day7  Joe             NA            reimbursed     NA        7.25    

I would like to add a "balance" column to this dataset where:

  • Row 1: starts with 50
  • Row 2: has previous balance amount + increase - decrease
  • Row 3, etc.: same as row 2 formula

I've been struggling to do this myself as I don't know if there are any existing functions which help with this types of data manipulation. The only function that comes to mind is the dplyr::lag() but I'm not sure how to use it.

Any help is appreciated :)

kiwi
  • 565
  • 3
  • 11

2 Answers2

3

You could first create a column of the change and second use purrr::accumulate to create your balance column:

library(dplyr, warn = FALSE)
library(purrr)

transactions |> 
  mutate(change = coalesce(increase, -decrease),
         balance = accumulate(change, ~ .x + .y))
#> # A tibble: 7 × 9
#>      id day   sent_to         received_…¹ refer…² decre…³ incre…⁴ change balance
#>   <int> <chr> <chr>           <chr>       <chr>     <dbl>   <dbl>  <dbl>   <dbl>
#> 1     1 day1  <NA>            ATM         add_ca…    NA     50     50       50  
#> 2     2 day2  Garden Cinema   <NA>        cinema…    10.8   NA    -10.8     39.2
#> 3     3 day3  Pasta House     <NA>        meal       12.5   NA    -12.5     26.7
#> 4     4 day4  <NA>            Sarah       gift       NA     30     30       56.7
#> 5     5 day5  Blue Superstore <NA>        shoppi…    15.2   NA    -15.2     41.4
#> 6     6 day6  Jane            <NA>        reimbu…    NA      5.4    5.4     46.8
#> 7     7 day7  Joe             <NA>        reimbu…    NA      7.25   7.25    54.1
#> # … with abbreviated variable names ¹​received_from, ²​reference, ³​decrease,
#> #   ⁴​increase
stefan
  • 90,330
  • 6
  • 25
  • 51
  • thanks, great answer :) Just wondering, is there a particular reason why you've used `|>` instead the magrittr pipe `%>%` ? – kiwi Nov 23 '22 at 01:16
  • for a similar question on the same dataset: https://stackoverflow.com/questions/74550067/how-to-conditionally-select-a-column-and-subtract-values-in-those-rows-from-row – kiwi Nov 23 '22 at 16:32
2

You could first make a column change that has negative values in case of a decrease and positive values in case of an increase. In turn, you could use the cumsum function to create a cumulative total for the balance column.

transactions <- transactions %>%
  mutate(
    change = case_when( 
      !is.na(decrease) ~ -1*decrease, #make  values negative if decrease 
      !is.na(increase) ~ increase),
    balance = cumsum(change))

Output:

> transactions
# A tibble: 7 × 9
     id day   sent_to         received_from reference      decrease increase change balance
  <int> <chr> <chr>           <chr>         <chr>             <dbl>    <dbl>  <dbl>   <dbl>
1     1 day1  NA              ATM           add_cash           NA      50     50       50  
2     2 day2  Garden Cinema   NA            cinema_tickets     10.8    NA    -10.8     39.2
3     3 day3  Pasta House     NA            meal               12.5    NA    -12.5     26.7
4     4 day4  NA              Sarah         gift               NA      30     30       56.7
5     5 day5  Blue Superstore NA            shopping           15.2    NA    -15.2     41.4
6     6 day6  Jane            NA            reimbursed         NA       5.4    5.4     46.8
7     7 day7  Joe             NA            reimbursed         NA       7.25   7.25    54.1
Flap
  • 106
  • 4