0

I have two datasets, one consisting of data I have collected personally for individual specimens and the other consisting of mean data from previous studies reported in the literature. What I want to do is re-average the data combining the individual measurements and mean measurements. For example, if I had 10 individual specimens and a reported mean of 10 individuals of the same species from a different study, I would want to produce a mean value of 20 specimens. Attached is a sample dataset. There aren't any overlapping taxa between df and df2, but in the actual dataset there are.

df<-data.frame(taxon=c("Abrocoma_bennettii","Abrocoma_bennettii","Abrocoma_bennettii",
                   "Sylvisorex_johnstoni","Abrocoma_bennettii","Abrocoma_bennettii",
                   "Abrocoma_bennettii","Blarina_carolinensis","Abrocoma_cinerea",
                   "Sorex_hoyi","Abrocoma_cinerea","Sorex_cinereus",
                   "Cryptotis_parva","Sorex_cinereus","Sorex_nanus",
                   "Sorex_nanus","Sorex_vagrans","Peromyscus_leucopus",
                   "Sorex_cinereus","Sorex_nanus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Cryptotis_parva",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_nanus","Sorex_nanus","Sorex_vagrans",
                   "Sorex_cinereus","Sorex_nanus","Sorex_nanus",
                   "Sorex_arcticus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_fumeus",
                   "Sorex_haydeni","Sorex_haydeni","Sorex_nanus",
                   "Blarina_brevicauda","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Abrothrix_longipilis","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_monticolus","Sorex_monticolus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_haydeni","Sorex_haydeni",
                   "Sorex_haydeni","Sorex_hoyi","Sorex_hoyi",
                   "Sorex_nanus","Sorex_nanus","Cryptotis_parva",
                   "Cryptotis_parva","Cryptotis_parva","Cryptotis_parva",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus"),
           x=c(159.0,221.0,184.0,55.0,163.0,214.0,232.0,67.0,198.0,55.0,150.0,55.0,57.0,56.5,56.0,55.0,61.0,67.0,55.0,56.0,62.0,58.0,58.0,55.0,57.0,55.0,55.0,57.5,55.0,55.0,55.0,61.0,60.0,64.0,55.0,56.0,56.0,55.5,58.0,56.0,61.0,63.0,60.0,58.5,55.0,56.0,60.0,55.0,70.0,55.0,55.0,59.0,70.0,65.0,88.0,56.0,63.0,55.0,55.0,56.0,55.0,58.0,57.0,65.0,55.0,55.0,59.0,55.0,60.0,57.0,66.0,65.0,60.0,60.0,62.0,56.5,58.0,58.0,56.0,57.0,55.0,55.0,57.0,63.0,58.0,57.0,59.0,55.0,55.0,56.0,57.0,58.0,60.0,55.0,59.0,55.5,55.0,68.0,66.0,64.0),
y=c(115.00, 286.00, 222.00,   1.00, 109.00, 224.00, 317.00,   1.40, 144.00,   1.75,
105.00,   1.85,   1.90,   2.00,   2.00,   2.00,   2.00,   2.10,   2.10,   2.20,
2.30,   2.30,   2.40,   2.50,   2.50,   2.50,   2.50,   2.50,   2.50,   2.50,
2.50,   2.50,   2.50,   2.60,   2.60,   2.60,   2.70,   2.70,   2.70,   2.70,
2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.80,   2.80,
2.80,   2.80,   2.80,   2.80, 222.00,   2.80,   2.80,   2.80,   2.80,   2.80,
2.80,   2.86,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,
2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,
2.90,   2.90,   2.90,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,
3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00))
df2<-data.frame(taxon=c("Eulemur_collaris","Leopardus_colocolo",
"Leopardus_colocolo","Vicugna_vicugna","Vicugna_vicugna","Equus_quagga",
"Equus_quagga","Priodontes_maximus","Priodontes_maximus","Crocuta_crocuta"),
N=c(11,10,2,50,50,13,8,9 ,9,5),
x=c(461.0,565.0,505.0,1107.0,963.0,2046.0,2050.0,929.1,926.9,1236.0),
y=c(2150,3900,4000,36200,33200,247830,219050,31680,34690,47400))

Previously, I had been doing this using the following code, following the answer to one of my previous questions.

df3<-df%>%
  mutate(N = 1) %>%
  bind_rows(df2) %>%
  group_by(taxon) %>%
  summarise(across(c(x,y), weighted.mean, N), 
            N = sum(N))

This worked for quite some time. However, when I recently tried to re-run the code, I got the following error.

Error: Problem with `summarise()` input `..1`.
x object 'N' not found
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i The error occurred in group 1: taxon = "Abrocoma_bennettii".

I have been unable to figure out what produces this error. I did not change anything in the code used to run the data. I went back and loaded an older version of the database which I knew had previously successfully run the code and I got the same error where I did not before. As you can see from the dataset, this is a reproduceable error based on just this small subsample of my data. The fact that R returns this error even on datasets on which the code previously worked as well as highly simplified datasets makes me wonder if this is a bug in dplyr rather than something to do with the syntax of the data frame. But I do not know precisely what the error is or how to rectify it.

I ran each line of code individually and it turns out it is the line summarise(across(c(x,y), weighted.mean, N), N = sum(N)) that is causing the error, but I still am unable to figure out what specifically is wrong.

user2352714
  • 314
  • 1
  • 15
  • You have an additional `%>%` in the code. Apart from that this code works for me without any error for the data you have shared. What is your `packageVersion('dplyr')` ? I am on `‘1.0.3’` – Ronak Shah Mar 03 '21 at 03:11
  • 1
    It seems to be a bug that got introduced on v1.0.4. It seems that the next release will fix them: https://dplyr.tidyverse.org/news/index.html "Fixed bugs introduced in across() in previous version". – Phil Mar 03 '21 at 03:48
  • 1
    In the meantime, `summarise(across(c(x,y), ~ weighted.mean(., N))` works. – Phil Mar 03 '21 at 03:49
  • @RonakShah I am on `'1.0.4'`. Also I fixed the issue with the extra `%>%`. – user2352714 Mar 03 '21 at 23:44

1 Answers1

0

Try:

df3<-df %>%
  mutate(N = 1) %>%
  bind_rows(df2) %>%
  group_by(taxon) %>%
  dplyr::summarise(across(c(x,y), weighted.mean, N), 
            N = sum(N))
TarJae
  • 72,363
  • 6
  • 19
  • 66