How to get anti_join to work properly in data frame

Question

I have data that looks like this

conflict_ID country_code   SideA
1              1             1          
1              2             1 
1              3             0
2              4             1
2              5             0

I used the following code by help of this forum:

library(dplyr)
library(tidyr)

mydf %>%
  group_by(conflict_ID) %>%
  summarise(country_code = combn(country_code, 2, sort, simplify = FALSE),
            .groups = 'drop') %>%
  unnest_wider(country_code, names_sep = '_') %>%
  anti_join(filter(mydf, SideA == 1),
            by = c("conflict_ID", "country_code_2" = "country_code"))

# # A tibble: 3 × 3
#   conflict_ID country_code_1 country_code_2
#         <int>          <int>          <int>
# 1           1              1              3
# 2           1              2              3
# 3           2              4              5

to end up with the result you can see above. However, in the actual data, not all conflicts end up listed in the data frame that is created. They only appear, if in the original table, SideA was the first country in the list (table showed 1) and in the next row, the other party had a 0 (indicating that they are not SideA). If it is the other way around, the dyad simply doesn't show up in the table that is created. Any ideas of why that might be? I know that the problem is within the anti_join function, but I don't know what the problem actually is.

EDIT: to be more precise, here is an example that works (countries end up in resulting table) and on where it doesn't work:


# with this input, it works 
dispnum ISO3  sidea     
 4414   AZE      0            
 4414   ARM      1 

# with this input, it does not work
dispnum ISO3  sidea 
  4613   ARG     0        
  4613   GHA     1

And I think that the first part of the code does something to the data that the anti_join picks up in a weird way. Maybe it is, because it goes through the data alphabetically, and this works because ARM comes has the 1 and comes before AZE and doesnt work in the other case because GHA (which has sidea = 1) comes after ARG?

Rather than just showing code, could you explain the goal? You start with a data frame conflicts, countries involved, and what side they were on. Your goal is to have a data frame of ... what? Is it the same goal as the previous question? (If so please state it so this question is self-contained.) Or is it a slightly different goal? — Gregor Thomas, Oct 17 '22 at 15:37
The goal is the same, however, this question focuses on the way the anti_join function can be modified to make it work. I still want to end up with state pairs that engage in conflict with each other, but the anti_join function described above only works in some cases and I don't understand why. — craszer, Oct 17 '22 at 15:42
The `combn`/`anti_join` approach seems overcomplicated. I've added a simpler answer at your original question. — Gregor Thomas, Oct 17 '22 at 16:18
If you want more help here, I'd strongly suggest modifying the sample data to include a case where it doesn't work. — Gregor Thomas, Oct 17 '22 at 16:19

How to get anti_join to work properly in data frame

0 Answers0