1

I am performing simple column-wise math operations on data frame rows that also involve accessing adjacent, previous data frame rows. Although the below code works, it's cumbersome (at least with respect to my liberal use of cbind() and subset() functions) and I wonder if there's a clean way to get the same results using an apply() or other super duper R function. In base R if possible.

I'm adding and subtracting column values in each data frame row (referring to the below columns, "plus1" + "plus 2" - "minus" = "total"), and if the id number is the same as you move down from one row to the next, adding in the plus1 from the prior row. See below illustration:

  id   plus1 plus2 minus total [total explained]
1  1     3     5    10    -2
2  2     4     5     9     0
3  3     8     5     8     5   [8 + 5 - 8 = 5, ignoring "plus1" in row 2 since "id" changed between rows 2 and 3]
4  3     1     4     7     6   [1 + 4 - 7, + 8 from "plus1" col in row 3 since "id" is same in rows 3 and 4, = 6]
5  3     2     5     6     2   [2 + 5 - 6, + 1 from "plus1" col in row 4 since "id" is same in rows 4 and 5, = 2]
6  5     3     6     5     4   [3 + 6 - 5 = 4, ignoring "plus1" in row 5 since "id" changed between rows 5 and 6]

Here is the code I used to generate the above:

data <- data.frame(id=c(1,2,3,3,3,5), 
                   plus1=c(3,4,8,1,2,3), 
                   plus2=c(5,5,5,4,5,6),
                   minus = c(10,9,8,7,6,5))

data <- cbind(data,
              tmp1=(data[ ,"plus1"] +
                    data[ ,"plus2"] -
                    data[ ,"minus"]
                   )
              )

grp <- with(rle(data$id), rep(seq_along(values), lengths))
data$tmp2 <- with(data,ave(plus1, grp, FUN = function(x) c(0, x[-length(x)])))

data <- cbind(data, total = round((data[ ,"tmp1"] + data[ ,"tmp2"]),2))
data <- subset(data, select = -c(tmp1,tmp2) )
data

I'm pursuing simplicity in the world of apply() because I'll be using many derivations of this sort thing in my current project. Looks like I'm mimicking Excel in R, which I am.

user438383
  • 5,716
  • 8
  • 28
  • 43

2 Answers2

1

I think a simple way to do this is using the lag function from dplyr package. I used case_when to check if the id changed. If it didn't change, you add the extra term, otherwise you add 0.

library(dplyr)

data2<-data %>%
  mutate(extra=case_when(
    id==lag(id) ~ lag(plus1), 
    TRUE ~ 0
  )) %>%
  mutate(computed_total=plus1+plus2-minus+extra)
Joe Erinjeri
  • 1,200
  • 1
  • 7
  • 15
  • Yes, I'm beginning to see the light with dplyr. The code sure looks intuitive, for both dplyr answers from Peter and Joe. Though the base R example posted by Peter looks fairly straightforward too. I've been wary of relying on too many packages but it seems dplyr is used by everyone experienced with R. – Curious Jorge - user9788072 Dec 11 '21 at 17:47
  • In the larger program this will be used in, there are many more variables to manipulate than in the OP. This solution is very clean when dealing with many variables. – Curious Jorge - user9788072 Dec 12 '21 at 08:34
1

Here's a base R solution, but not with apply, so may not be acceptable, although it seems less complex than the OP's code.

Not sure that the apply family (excuse the pun) applies in this case as my understanding is that typically functions are applied to all the columns, or rows, or list elements, whereas in this case a new variable is being created. However, I am not that familiar with using apply and friends, so this may not be correct.

By way of comparison have added a dplyr solution although I am aware OP asked specifically for base R.


data$lag_id <- c(0, data$id[-nrow(data)])
data$lag_plus1 <- c(NA, data$plus1[-nrow(data)])

data$total <- with(data, ifelse(id == lag_id, plus1 + plus2 - minus + lag_plus1, plus1 + plus2 - minus))

data[ , -c(5:6)]
#>   id plus1 plus2 minus total
#> 1  1     3     5    10    -2
#> 2  2     4     5     9     0
#> 3  3     8     5     8     5
#> 4  3     1     4     7     6
#> 5  3     2     5     6     2
#> 6  5     3     6     5     4



library(dplyr)

data %>% 
  mutate(total = case_when(id == lag(id) ~ plus1 + plus2 - minus + lag(plus1),
                          TRUE ~ plus1 + plus2 - minus))
#>   id plus1 plus2 minus total
#> 1  1     3     5    10    -2
#> 2  2     4     5     9     0
#> 3  3     8     5     8     5
#> 4  3     1     4     7     6
#> 5  3     2     5     6     2
#> 6  5     3     6     5     4

Created on 2021-12-11 by the reprex package (v2.0.1)

Peter
  • 11,500
  • 5
  • 21
  • 31