I have a dataset with several binary variables (x1-x5, values: 1, 2, NA). My goal is to identify whether pairs of binary variables have zero or very low cell counts in the cross-tab table (after ignoring the missing values). So, I would like to calculate the cross-tab table for each pair of binary variables in my data set, extract the lowest value from each table, and report the lowest value from each cross-table into a matrix. By doing so, I would have something similar to a correlation matrix where, instead of correlation coefficients, I would be able to look at the lowest cell count for each pair of variables. Below I created a toy dataset for anyone who will decide to help.
library(tidyverse)
df <- data.frame(x1 = rbinom(n = 1000, size = 1, prob = 0.5),
x2 = rbinom(n = 1000, size = 1, prob = 0.3),
x3 = rbinom(n = 1000, size = 1, prob = 0.4),
x4 = rbinom(n = 1000, size = 1, prob = 0.2),
x5 = rbinom(n = 1000, size = 1, prob = 0.05)) |>
mutate(across(everything(), ~as.factor(.))) |>
mutate(across(everything(), ~recode(., "1" = "2", "0" = "1")))
df1 <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))