I am trying to make a combination analysis that shows the results in a plot. I have a data frame with 9 columns and each column consists of different percentages or NA's if a value was not present in the sample.
The example code I have used for this can be found here: https://epirhandbook.com/en/combinations-analysis.html
The issue is that in a line the 1's are changed to 0's and vice versa. The line is:
data <- data %>%
mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))
The full code that I have used is:
library(tidyverse)
library(UpSetR)
library(ggupset)
data <- META_new[c("lengthpergram","countpergram","acrylrel",
"cottonrel","polyestrel","polyamiderel",
"elastaanrel","lyocellrel","viscoserel",
"nylonrel","wolrel")]
columns <- c("acrylrel", "cottonrel", "polyestrel", "polyamiderel",
"elastaanrel", "lyocellrel", "viscoserel", "nylonrel", "wolrel")
for (col in columns) {
data[[col]][data[[col]] > 0] <- "yes"
data[[col]][data[[col]] == 0] <- NA
}
data <- data %>%
mutate(acrylrel = ifelse(acrylrel == "yes", 1, 0),
cottonrel = ifelse(cottonrel == "yes", 1, 0),
polyestrel = ifelse(polyestrel == "yes", 1, 0),
polyamiderel = ifelse(polyamiderel == "yes", 1, 0),
elastaanrel = ifelse(elastaanrel == "yes", 1, 0),
lyocellrel = ifelse(lyocellrel == "yes", 1, 0),
viscoserel = ifelse(viscoserel == "yes", 1, 0),
nylonrel = ifelse(nylonrel == "yes", 1, 0),
wolrel = ifelse(wolrel== "yes", 1, 0),)
data <- data %>%
mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))
data %>%
UpSetR::upset(
sets = columns,
order.by = "freq",
sets.bar.color = c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "salmon"),
empty.intersections = "on",
number.angles = 0,
point.size = 2,
line.size = 1,
mainbar.y.label = "Fabric combinations by frequency",
sets.x.label = "Types of fabric present in samples")
The code gives a good plot. But it allocates the wrong column name to the value. For example, polyestrel is supposed to be the most frequent combination, but lyocellrel is allocated, even though lyocellrel is least frequent.
Unfortunately I cannot add the df, as it is too big, but I hope someone has suggestions on how to fix this (if this line is even the problem).
I changed some of the original code of the website, original:
mutate(across(c(fever, chills, cough, aches, vomit), .fns = ~+(.x == "yes")))
Because when I tried it I got this error:
Error in start_col:end_col : argument of length 0
First 5 rows
data <- data <- data.frame(
acrylrel = c(0.00000, 0.00000, 0.00000, 36.61972, 0.00000),
cottonrel = c(9.089974, 65.000000, 0.000000, 19.014085, 8.500000),
polyestrel = c(83.72237, 35.00000, 42.81081, 44.36620, 15.00000),
polyamiderel = c(5.583548, 0.000000, 53.594595, 0.000000, 40.000000),
elastaanrel = c(1.604113, 0.000000, 3.594595, 0.000000, 1.500000),
lyocellrel = c(0, 0, 0, 0, 0),
viscoserel = c(0, 0, 0, 0, 0),
nylonrel = c(0, 0, 0, 0, 0),
wolrel = c(0, 0, 0, 0, 0)
)