6

I have a dataframe which lists species observations across multiple survey plots (the data is here). I'm trying to use tidyr's pivot_wider to spread that abundance data across several columns, with the new columns being each of the observed species. Here's the line of code I'm trying to use to do that:

data %>% pivot_wider(names_from = Species, values_from = Total.Abundance, values_fill = 0)

However, this gives me two error messages:

Error: Can't convert <double> to <list>.
Values are not uniquely identified; output will contain list-cols.

I'm not sure what the issue is, because this has worked fine for several other dataframes that are (seemingly) identical to this one. I've tried googling the first error message and have not been able to find what conditions cause it—I don't know what double R is trying to convert to a list, nor why it's trying to convert to a list. The Total.Abundance column should be integers, but I wonder if somehow it's a double data type? From what I've been able to find, the second error message appears when there are identical rows in the dataframe. However, the error persists when I modify my statement to

unique(data) %>% pivot_wider(names_from = Species, values_from = Total.Abundance, values_fill = 0)

Which I would have thought would remove duplicate rows. Any help would be much appreciated!

Kronimiciad
  • 190
  • 1
  • 2
  • 11
  • 2
    That error suggests that there are duplication in the rows that you are trying to key off of. You can use a `values_fn` to have it somehow summarize these rows if that makes sense. But I would focus on the "Values are not uniquely identified" part of the error message. You probably want a `group_by()` and then a `summarize()` before the `pivot_wider()` to fix this. –  Aug 11 '20 at 20:06
  • 1
    Can't access your file. You need to share the "public link" of it (click "Share" in dropbox and then "Share a link instead"). – Ben Toh Aug 11 '20 at 20:22
  • 1
    Also, `unique(data)` can't solve this problem, for example: `Site A, Species 1, 1000` and `Site A, Species 1, 5` won't get "eliminated" by `unique()` but cause problems when you use `pivot_wider()`; need to reconcile using something like `group_by(Site, Species) %>% summarise(abundance = sum(abundance))` – Ben Toh Aug 11 '20 at 20:27
  • @BenToh I think I fixed the hyperlink issue, thank you for pointing it out. – Kronimiciad Aug 11 '20 at 20:43
  • Thank you both for the group_by() and summarise() suggestions; I'll try those. – Kronimiciad Aug 11 '20 at 20:43

1 Answers1

6

Expanding on my comment, there are duplicates in your data that cannot be removed by unique() or in dplyr, distinct():

dat %>%
  distinct() %>%
  group_by(Plot.ID, Species) %>%
  count()
#   Plot.ID Species                  n
#     <dbl> <chr>                <int>
# 1       1 Calliopius               1
# 2       1 Idotea                   2
# 3       1 Lacuna vincta            2
# 4       1 Mitrella lunata          2
# 5       1 Podoceropsis nitida      1
# 6       1 Unk. Amphipod            1
# 7       1 Unk. Bivalve             1
# 8       2 Calliopius               1
# 9       2 Caprella penantis        1
#10       2 Corophium insidiosum     1

Need to find out why you have duplicates like this and reconcile it, say by summing them up. The problem might be coming out of data wrangling coding bugs in which case summing is not necessarily suitable. Or perhaps say you sample same plot twice, you want mean instead of sum to normalize vs sampling effort, or perhaps you need extra column indicating sampling effort). Nevertheless, this works perfectly:

dat %>%
  group_by(Plot.ID, Species) %>%
  summarise(abundance = sum(Total.Abundance)) %>%
  tidyr::pivot_wider(names_from = Species, values_from = abundance, 
                     values_fill = 0)
Ben Toh
  • 742
  • 5
  • 9