Sum and subtract values in a dataframe

Question

I have a dataframe of inputs and another with a set of combinations that I'd like to aggregate those inputs using.

It's relatively straightforward to use the function in the example below if all the combinations (in "teams") were addition. However, if I want to subtract a result (ie. team1 is the sum of Mary and Frank minus John) this becomes seemingly much more problematic.

I can't think of a straightforward way to accomplish this. Does anyone have a suggestion on what I might try?

## Inputs ##

input = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Frank", "Mary", "John"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(55, 76, 14)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

## Teams ##

teams = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Mary + Frank - John","Mary + John - Frank", "John + Frank - Mary")), class = "data.frame", row.names = c(NA, -3L))

## Group the inputs into one ##

all_objects = ls()
input_objects = grep("^input", all_objects, value = T)
input_test = as.data.frame(input_objects)

## Function to add (NB: this doesn't not give the desired output)##

result = function(input, teams) {
  data = teams %>%
    separate_rows(V2) %>%
    left_join(input, by = c("V2" = "V2")) %>%
    replace_na(list(V4 = 0)) %>%
    group_by(V1.x) %>% fill(V1.y, V3) %>%
    summarize(V1.y = first(V1.y),
              V2 = paste(V2, collapse = " + "),
              V3 = first(V3),
              V4 = sum(V4))
  return(data)
}
## Output ##

desired_output = structure(list(V1.x = c("team1", "team2", "team3"), V1.y = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Mary + Frank - John", "Mary + John - Frank", "John + Frank - Mary"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(117, 35, -7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))

score 2 · Accepted Answer · answered Jan 25 '23 at 05:00

I propose a different approach. Consider the variables in teams as formulas to evaluate. Then create an empty environment and evaluate them. Inspired by this.

# Inputs ##
input <- structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), 
                       V2 = c("Frank", "Mary", "John"), 
                       V3 = c("Sydney", "Sydney", "Sydney"), 
                       V4 = c(55, 76, 14)),
                  row.names = c(NA, -3L), class = c("data.table", "data.frame"))

## Teams ##
teams <- structure(list(
  V1 = c("team1", "team2", "team3"), 
  V2 = c("Mary + Frank - John","Mary + John - Frank", "John + Frank - Mary")), 
  class = "data.frame", row.names = c(NA, -3L))

# Evaluation environment with only + and -
eval_env <- new.env(parent = emptyenv())
eval_env$`+` = `+`
eval_env$`-` = `-`

# assign names as variables in the environment
for(i in 1:nrow(input)) assign(input$V2[i], input$V4[i], envir = eval_env)

#Evaluate formulas
V4 <- sapply(teams$V2, \(x) eval(str2lang(x), eval_env))

V4
#Mary + Frank - John Mary + John - Frank John + Frank - Mary 
#                117                  35                  -7

Problem: only work if names are valid variable names oherwise you must quote them with `` — Ric, Jan 25 '23 at 05:03

score 0 · Answer 2 · answered Jan 25 '23 at 04:23

You can accomplish this by first creating a new column in the input dataframe that indicates whether each row should be added or subtracted from the final result. You can do this by splitting the teams$V2 column into separate rows, then using the separate function to split the V2 column into separate columns for the variable names and the addition/subtraction operators. Then you can use ifelse() function to check the operator, if it's "+" then return the value of V4 else return the negative of V4.

Then you can join the input dataframe and the modified teams dataframe on the variable names, and use the group_by() and summarize() functions to aggregate the values based on the team name.

Check the following sample code

# Create new column in input indicating whether to add or subtract each row
input$operator = ifelse(input$V2 %in% teams$V2 & grepl("+", teams$V2), "+", "-")

# Split teams$V2 into separate rows
teams = teams %>% separate_rows(V2)

# Split teams$V2 into separate columns for variable names and operators
teams = teams %>% separate(V2, c("V2", "operator"), "+-")

# Join input and teams on variable names and operator
data = teams %>% left_join(input, by = c("V2" = "V2", "operator" = "operator"))

# Replace NA values with 0
data = data %>% 
    replace_na(list(V4 = 0))

# Group by team name and summarize values
result = data %>% 
       group_by(V1.x) %>% 
          summarize(V1.y = first(V1.y),
          V2 = paste(V2, collapse = " + "),
          V3 = first(V3),
          V4 = sum(V4))

I don't think this works, when I run # Split teams$V2 into separate columns... I get "Error in gregexpr(pattern, x, perl = TRUE) : invalid regular expression '+-' In addition: Warning message: In gregexpr(pattern, x, perl = TRUE) : PCRE pattern compilation error 'quantifier does not follow a repeatable item' at '+-'" — nicshah, Jan 25 '23 at 04:31

Sum and subtract values in a dataframe

2 Answers2