I am trying to create a new column using existing column names within my dataset. Specifically, in this dataset, one row corresponds to one respiratory specimen and has ~10 virus test results, each with a corresponding column; a value of 0 indicates testing negative for this virus, while a value of 1 indicates testing positive for this virus. A simplified version of the dataset would look something like this:
Specimen ID | FLUA | FLUB | RSV | HMPV |
---|---|---|---|---|
1 | 1 | 0 | 1 | 0 |
2 | 0 | 1 | 1 | 0 |
3 | 0 | 1 | 1 | 1 |
I would like to create a new column called "virus_combinations" that lists out the viruses for which this specimen tested positive. I would like the combination of viruses to be listed in alphabetical order so that R will recognize the same combinations of viruses as being the same value (e.g., FLUA + RSV is the same combination as RSV + FLUA, so I want them listed in the same order).
Specimen ID | FLUA | FLUB | RSV | HMPV | virus_combinations |
---|---|---|---|---|---|
1 | 1 | 0 | 1 | 0 | FLUA, RSV |
2 | 0 | 1 | 1 | 0 | FLUB, RSV |
3 | 0 | 1 | 1 | 1 | FLUB, HMPV, RSV |
As I mentioned, there are ~10 viruses I'm interested in and the number of viruses for which specimens tested positive ranges from 1-5. Therefore, it would be difficult to manually add all of the possible virus combinations with relevant test results.
I saw a related post on Stack Exchange, in which the solution involved the following code:
df1 <- df %>%
group_by(ID) %>%
summarise(output = paste(name[value == 1], collapse = ','))
I have tried adapting this code by removing the group_by() and summarise() commands and instead using it with mutate(), but I'm not sure if that's something I can actually do. I also don't understand how to specify which columns I want the new 'virus_combinations' variable to pull from.
df1 <- df %>%
mutate(virus_combinations = paste(names[value == 1], collapse = ','))
Any relevant commands or packages that you can think of would be much appreciated. Thank you in advance!