0

I need to take this data and group all matching values in column ESVId, while retaining each unique value in column match; and all values in column Form that were associated with each value in columnmatch(there may be duplicates!).

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000005", "ESV_000005", "ESV_000005", "ESV_000005", 
"ESV_000005", "ESV_000005", "ESV_000005", "ESV_000006", "ESV_000006", 
"ESV_000006", "ESV_000007"), MT_species = c(1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus", "Achillea millefolium", "Artemisia absinthium", 
"Artemisia ludoviciana", "Artemisia frigida", "Artemisia campestris", 
"Artemisia tridentata", "Artemisia tilesii", "Rubus arcticus", 
"Fragaria vesca", "Rosa acicularis", "Fragaria virginiana", "Rosa woodsii", 
"Rosa arkansana", "Rubus ursinus", "Poa pratensis", "Vahlodea atropurpurea", 
"Alopecurus magellanicus", "Prunus virginiana"), Form = c("Conifer", 
NA, "Forb", NA, "Forb", "Sub-Shrub", "Forb", "Shrub", NA, NA, 
"Forb", "Shrub", "Forb", "Shrub", NA, NA, "Graminoid", NA, NA, 
"Shrub")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

When I tried

MTTaxa_funct <- funct_esvs %>%
  group_by(ESVId) %>%
  summarise_all(funs(paste(unique(match, Form), collapse= " OR ")))%>%
  dplyr::select(ESVId, match, Form) %>% 
  ungroup()

it fills column Form identically to match, which is not at all what I want. I also need to keep any NA values in column Form. Ideally this would come out looking like this:

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000005", 
"ESV_000006", "ESV_000007"), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus OR Achillea millefolium OR Artemisia absinthium OR Artemisia ludoviciana OR Artemisia frigida OR Artemisia campestris OR Artemisia tridentata OR Artemisia tilesii", 
"Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fragaria virginiana OR Rosa woodsii OR Rosa arkansana OR Rubus ursinus", 
"Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magellanicus", 
"Prunus virginiana"
), Form = c("Conifer", "NA OR Forb OR NA OR Forb OR Sub-Shrub OR Forb OR Shrub OR NA", 
"NA OR Forb OR Shrub OR Forb OR Shrub OR NA OR NA", 
"Graminoid OR NA OR NA", 
"Shrub"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
sscoresby
  • 67
  • 5
  • 1
    (1) How does your expected output _look_ like? (2) In your shown data there is no `form` column. – Martin Gal Dec 22 '22 at 21:59
  • whoops, thanks for pointing that out! put it back in. – sscoresby Dec 22 '22 at 23:17
  • Now I still don't understand, how your `Form` column should look like after the summarising. – Martin Gal Dec 22 '22 at 23:20
  • Added desired output – sscoresby Dec 22 '22 at 23:24
  • 2
    Perhaps: `funct_esvs %>% group_by(ESVId) %>% summarise(match = paste(unique(match), collapse = " OR "), Form = paste(Form, collapse = " OR "), .groups = "drop")`? – Martin Gal Dec 22 '22 at 23:32
  • Can you expand on *"all values in column `Form` that were associated with each value in column `match`"*? If `match` must be all unique values (`paste(unique(match), collapse=" OR ")`), I don't understand if `Form` should be `paste(Form, collapse=" OR ")` or something contingent on the uniqueness of `match`. Also, where do `"Prunus pensylvanica"` and `"Prunus emarginata"` come from? – r2evans Dec 23 '22 at 00:48
  • I'm not sure how I should have phrased the question, but the answer you provided is exactly what I wanted. Given that, do you have a suggestion on a clearer phrasing I should have used? Oh, and those last 2 species were just me getting carried away when I was making my wishlist of how things should look (obviously my actual dataset is much larger than 20 rows). – sscoresby Dec 23 '22 at 03:15
  • 1
    Sorry @MartinGal, I honestly do not recall seeing your comment when I posted my nearly-identical answer. I didn't intend to hijack it from you. – r2evans Dec 23 '22 at 04:11
  • 1
    @r2evans I'm totally fine with your answer. :) – Martin Gal Dec 23 '22 at 08:04

1 Answers1

1

I'm not sure if this is what you need, since your expected output has elements not present in your source data, but perhaps this?

quux %>%
  group_by(ESVId) %>%
  summarize(
    match = paste(unique(match), collapse = " OR "), 
    Form = paste(Form, collapse = " OR ")
  )
# # A tibble: 5 × 3
#   ESVId      match                                                         Form 
#   <chr>      <chr>                                                         <chr>
# 1 ESV_000001 Pseudotsuga menziesii                                         Coni…
# 2 ESV_000004 Artemisia dracunculus OR Achillea millefolium OR Artemisia a… NA O…
# 3 ESV_000005 Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fraga… NA O…
# 4 ESV_000006 Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magella… Gram…
# 5 ESV_000007 Prunus virginiana                                             Shrub
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • this works, as does Martin Gal's answer below, which is almost the same except the ```.groups="drop"``` at the end. what does that do and why are the advantages/disadvantages to including? – sscoresby Dec 23 '22 at 03:11
  • 1
    (Honestly I didn't see @MartinGal's comment, I should have waited.) Depending on what work is done, sometimes the grouping remains for all or some of the group variables. `?summarize` is a good reference for it. – r2evans Dec 23 '22 at 04:09