Calculate lag string in group

Question

I have a toy electoral db and need to calculate incumbency but cannot using grouped values and dplyr::lag

race <- data.frame(city=rep(1,6),
                   date=c(3,3,2,2,1,1),
                   candidate=c("A","B","A","C","D","E"),
                   winner=rep(c(1,0),3))

I made a convoluted attempt that is not ideal (as I have to merge in non-winners:

race %>%
  group_by(city,date) %>% 
  mutate(win_candidate=candidate[winner==1]) %>% 
  filter(winner==1) %>% 
  ungroup() %>%
  group_by(city) %>% 
  mutate(incumbent=lead(win_candidate, n=1, default = NA_character_),
         incumbent=ifelse(candidate==incumbent,1,0)) %>%
  select(-win_candidate)

Can you explain the logic of to be used here? when should `incumbent` be 0 or 1 ? — Ronak Shah, Mar 18 '21 at 12:02
Please see the attempt solution. The logic is if a person (candidate) in the same location (city) wins the election in the last occasion (date), that person is the incumbent. — MCS, Mar 18 '21 at 12:12

score 1 · Accepted Answer · answered Mar 18 '21 at 12:09

How about this:

r <- race %>%
  group_by(city,date) %>% 
  summarise(win_candidate = candidate[which(winner== 1)]) %>% 
  ungroup %>% 
  group_by(city) %>% 
  arrange(date) %>% 
  mutate(prev_win_candidate = lag(win_candidate)) %>% 
  left_join(race, .) %>%
  mutate(incumbent = as.numeric(candidate == prev_win_candidate), 
         incumbent = case_when(
           is.na(incumbent) ~ 0, 
           TRUE ~ incumbent)) %>% 
  select(-c(win_candidate, prev_win_candidate))
  
#   city date candidate winner incumbent
# 1    1    3         A      1         1
# 2    1    3         B      0         0
# 3    1    2         A      1         0
# 4    1    2         C      0         0
# 5    1    1         D      1         0
# 6    1    1         E      0         0

Ronak Shah · Answer 2 · 2021-03-19T01:04:13.720

1

Arrange the data by date, for each candidate in a city assign 1 if the previous winner value is 1 and if the difference between consecutive dates is 1.

library(dplyr)

race %>%
  arrange(city, candidate, date) %>%
  group_by(city, candidate) %>%
  mutate(incumbent = +(lag(winner, default = 0) == 1 & date - lag(date) == 1))%>%
  ungroup

#   city  date candidate winner incumbent
#  <dbl> <dbl> <chr>      <dbl>     <int>
#1     1     2 A              1         0
#2     1     3 A              1         1
#3     1     3 B              0         0
#4     1     2 C              0         0
#5     1     1 D              1         0
#6     1     1 E              0         0

edited Mar 19 '21 at 01:04

answered Mar 18 '21 at 12:19

Ronak Shah

377,200
20
156
213

Despite being an elegant solution, the code fails by attributing the status of incumbent to a candidate who won more than one election ago, see example race <- data.frame(city=rep(1,6), date=c(3,3,2,2,1,1), candidate=c("A","B","C","D","A","C"), winner=rep(c(1,0),3)) – MCS Mar 18 '21 at 15:39
Then maybe the update is what you want @MCS – Ronak Shah Mar 19 '21 at 01:05

Calculate lag string in group

2 Answers2