0

I am using the function "ps_venn" to compare taxa present in different samples with a phyloseq object. This function outputs a nested list of each sample(s) and the intersecting taxa:

enter image description here

Because this is metabarcoding data, the taxa have these long complicated names. I have a dataframe of the taxa lineages formatted like this: enter image description here

I would like to rename the taxa in the list based on the name in the "class" column.

I have no experience with lists in R, so I'd appreciate any guidance. Thanks!

Edit: Here is the function I used on my phyloseq object:

cvenn.met <- ps_venn(combined_c, group = "method", weight = FALSE, plot = FALSE)

Here is the beginning of the output list:

list(OBB = c("329966c334544d14f9985b98b813f40f", 
"8e1c87829579917f8f77f7fe7a30156a"
), OES = "2f86f2cb2e0879ebd39e60982959c8bd", QBB = c("9be3f72560b678f6bbd584632672818a", 
"3a7f78f620f4733bf2344867beae26aa", "ca57149144a6a6dfdb6e14465d3e2123", 
"8612ebe6094b2f7fd25985e5c0c36226", "5a3ed6459016f5c9398eab8f051940a0", 
"c3f2f11de98c6f64740ea772e202bcbc"), QPP = "8d7b15445ca448bec893311b47510e00", 
    QPS = c("407465934116d64a8d61c12cee90b0b0", "768ed20a18290c921ed30f24e458e25a", 
    "b7a099fb2ea20e4a13fa7c52820eeb6c"), MIC__OBB__OES__OMT__QBB__QBT__QPP__QPS = c("74bda332d0a3174634f9b496b1da8d0c", 
    "cd9cac265a41b06843d41aaa1893efd5", "72e5af8afb3fdd3323fede4d49e97bda", 
    "d785682c0a83be9275e095d76cabbe36", "fd736d603728e963c8c47487a8e48755", 
    "875f4582d8e1dc661efbda0d8bb11c22", "b8615ae8b54a17ee118afe8718d7ee11"
    ))

Here is the beginning of my taxonomy dataframe:

structure(list(X = c("0021706b1ca315556a24b6d5df927e5b", "0038f2eedf8cc7893a7a9a4330aa477c", 
"003ba56d29607b45d8599085b8b69afa", "004610d70fb6092436394ca4b09bf6fb", 
"004af7f8f83f24fb7b51d8335583e14a", "0053fa60aebebf5f5e6008c70425230c"
), domain = c("Eukaryota", "Eukaryota", "Eukaryota", "Eukaryota", 
"Eukaryota", "Eukaryota"), supergroup = c("Haptista", "TSAR", 
"TSAR", "TSAR", "Obazoa", "Obazoa"), division = c("Haptophyta", 
"Alveolata", "Rhizaria", "Alveolata", "Opisthokonta", "Opisthokonta"
), subdivision = c("Haptophyta_X", NA, "Radiolaria", "Dinoflagellata", 
NA, "Metazoa"), class = c("Prymnesiophyceae", NA, "RAD-B", "Syndiniales", 
NA, "Arthropoda"), order = c("Prymnesiales", NA, "RAD-B_X", "Dino-Group-II", 
NA, "Crustacea"), family = c("Chrysochromulinaceae", NA, "RAD-B_X_Group-IVd", 
"Dino-Group-II-Clade-2", NA, "Maxillopoda"), genus = c("Chrysochromulina", 
NA, "RAD-B_X_Group-IVd_X", "Dino-Group-II-Clade-2_X", NA, NA), 
    species = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Consensus = c(0.625, 
    0.75, 0.714, 0.714, 1, 0.6)), row.names = c(NA, 6L), class = "data.frame")

I would like to rename the strings of text in my list with their corresponding taxonomy in the "class" column.

Ashley
  • 15
  • 2
  • Could you give some a reproducible example of your data? – stefan_aus_hannover Jun 15 '23 at 16:06
  • Can you provide a minimal example so we can see the data structure? It's not clear from these images what you want to do and how your data is structured. Please make a minimal subset of your data frames and use `dput` to output them as text which you can paste in to the question. It would also help if you could show an example of your desired output – divibisan Jun 15 '23 at 16:07
  • Thanks, I tried to add more info in the post. Hopefully this helps! – Ashley Jun 15 '23 at 16:26
  • @Ashley Its better to give your data frame names, otherwise answers have to come up with their own, which is gonna be different for each answer and difficult to reproduce consistently. Also, some `tax$X` associated *class* have `NA`, what should happen then? – Andre Wildberg Jun 15 '23 at 17:42
  • 1
    @AndreWildberg thanks, I'll do that in the future. I should have chosen a better subset of my taxonomy for this example. Phyloseq has a sort of under-the-hood way of filtering taxonomy so in this case, my actual venn diagram output doesn't include any taxa without class designations. Your answer below worked perfectly. Thank you! – Ashley Jun 15 '23 at 19:50

1 Answers1

0

If your output is called out and the taxonomy data frame tax, using lapply (on a modified dataset for demonstration purposes), with setdiff for the non-matching strings.

out[[3]][2] <- "003ba56d29607b45d8599085b8b69afa"
lapply(out, \(x) c(tax$class[tax$X %in% x], setdiff(x, tax$X)))
$OBB
[1] "329966c334544d14f9985b98b813f40f" "8e1c87829579917f8f77f7fe7a30156a"

$OES
[1] "2f86f2cb2e0879ebd39e60982959c8bd"

$QBB
[1] "RAD-B"                            "9be3f72560b678f6bbd584632672818a"
[3] "ca57149144a6a6dfdb6e14465d3e2123" "8612ebe6094b2f7fd25985e5c0c36226"
[5] "5a3ed6459016f5c9398eab8f051940a0" "c3f2f11de98c6f64740ea772e202bcbc"

$QPP
[1] "8d7b15445ca448bec893311b47510e00"

$QPS
[1] "407465934116d64a8d61c12cee90b0b0" "768ed20a18290c921ed30f24e458e25a"
[3] "b7a099fb2ea20e4a13fa7c52820eeb6c"

$MIC__OBB__OES__OMT__QBB__QBT__QPP__QPS
[1] "74bda332d0a3174634f9b496b1da8d0c" "cd9cac265a41b06843d41aaa1893efd5"
[3] "72e5af8afb3fdd3323fede4d49e97bda" "d785682c0a83be9275e095d76cabbe36"
[5] "fd736d603728e963c8c47487a8e48755" "875f4582d8e1dc661efbda0d8bb11c22"
[7] "b8615ae8b54a17ee118afe8718d7ee11"
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29