Avoid repetition in summarise() using across() or tidyverse

Question

I'm trying to use across() or another tidyverse function to simplify the lines that follow the general line of output.partyA = sum(x.partyA) - sum(y.partyA).

An example of one of the lines that I did to simplify the number of lines was: across(matches("^Seats_P_...$"), ~ .x / Dist_Mag, .names = "{str_replace({.col}, '_P_', '_P_Pct_')}"),.

Here's what I am trying to simplify:

df %>% 
group_by(MMD) %>%
summarise(
      N = n(),
      #taken out for clarity,
      DFP_RCV_DEM = sum(Seats_RCV_Pct_DEM) - sum(SVP_DEM), 
      DFP_RCV_REP = sum(Seats_RCV_Pct_REP) - sum(SVP_REP),
      DFP_RCV_SCT = sum(Seats_RCV_Pct_SCT) - sum(SVP_SCT),
      DFP_RCV_WRT = sum(Seats_RCV_Pct_WRT) - sum(SVP_WRT),
      DFP_RCV_LBT = sum(Seats_RCV_Pct_LBT) - sum(SVP_LBT),
      DFP_RCV_UND = sum(Seats_RCV_Pct_UND) - sum(SVP_UND)
    ) %>%
ungroup()

As @jdobres writes, we really need to see your data. One thing that is suggested by your variable names, however, is that your data frame is not [tidy](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html). I say this because your column names appear to contain information - the party name - that is relevant to your summary. By its very nature, the tidyverse is designed to work with tidy data. Give it untidy data and your life is, not unreasonably, more complicated than it need be. — Limey, Jan 30 '22 at 18:05

score 2 · Answer 1 · answered Jan 30 '22 at 18:05

We may loop across the 'Seats' columns, get the sum of those columns, replace the substring of the column names (cur_column()) with the required substring, get the value, get the sum and take the difference

library(dplyr)
library(stringr)
df %>% 
group_by(MMD) %>%
summarise(
      N = n(),
      across(starts_with("Seats_"),  ~ sum(.x) - 
          sum(get(str_replace(cur_column(), "Seats_RCV_Pct_",
     "SVP_"))), .names = "DFP_{str_remove_all(.col, 'Seats_|Pct_')}"))

Avoid repetition in summarise() using across() or tidyverse

1 Answers1