I have two datasets, one consisting of data I have collected personally for individual specimens and the other consisting of mean data from previous studies reported in the literature. What I want to do is re-average the data combining the individual measurements and mean measurements. For example, if I had 10 individual specimens and a reported mean of 10 individuals of the same species from a different study, I would want to produce a mean value of 20 specimens. Attached is a sample dataset. There aren't any overlapping taxa between df and df2, but in the actual dataset there are.
df<-data.frame(taxon=c("Abrocoma_bennettii","Abrocoma_bennettii","Abrocoma_bennettii",
"Sylvisorex_johnstoni","Abrocoma_bennettii","Abrocoma_bennettii",
"Abrocoma_bennettii","Blarina_carolinensis","Abrocoma_cinerea",
"Sorex_hoyi","Abrocoma_cinerea","Sorex_cinereus",
"Cryptotis_parva","Sorex_cinereus","Sorex_nanus",
"Sorex_nanus","Sorex_vagrans","Peromyscus_leucopus",
"Sorex_cinereus","Sorex_nanus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Cryptotis_parva",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_nanus","Sorex_nanus","Sorex_vagrans",
"Sorex_cinereus","Sorex_nanus","Sorex_nanus",
"Sorex_arcticus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_fumeus",
"Sorex_haydeni","Sorex_haydeni","Sorex_nanus",
"Blarina_brevicauda","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Abrothrix_longipilis","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_monticolus","Sorex_monticolus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_haydeni","Sorex_haydeni",
"Sorex_haydeni","Sorex_hoyi","Sorex_hoyi",
"Sorex_nanus","Sorex_nanus","Cryptotis_parva",
"Cryptotis_parva","Cryptotis_parva","Cryptotis_parva",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
"Sorex_cinereus"),
x=c(159.0,221.0,184.0,55.0,163.0,214.0,232.0,67.0,198.0,55.0,150.0,55.0,57.0,56.5,56.0,55.0,61.0,67.0,55.0,56.0,62.0,58.0,58.0,55.0,57.0,55.0,55.0,57.5,55.0,55.0,55.0,61.0,60.0,64.0,55.0,56.0,56.0,55.5,58.0,56.0,61.0,63.0,60.0,58.5,55.0,56.0,60.0,55.0,70.0,55.0,55.0,59.0,70.0,65.0,88.0,56.0,63.0,55.0,55.0,56.0,55.0,58.0,57.0,65.0,55.0,55.0,59.0,55.0,60.0,57.0,66.0,65.0,60.0,60.0,62.0,56.5,58.0,58.0,56.0,57.0,55.0,55.0,57.0,63.0,58.0,57.0,59.0,55.0,55.0,56.0,57.0,58.0,60.0,55.0,59.0,55.5,55.0,68.0,66.0,64.0),
y=c(115.00, 286.00, 222.00, 1.00, 109.00, 224.00, 317.00, 1.40, 144.00, 1.75,
105.00, 1.85, 1.90, 2.00, 2.00, 2.00, 2.00, 2.10, 2.10, 2.20,
2.30, 2.30, 2.40, 2.50, 2.50, 2.50, 2.50, 2.50, 2.50, 2.50,
2.50, 2.50, 2.50, 2.60, 2.60, 2.60, 2.70, 2.70, 2.70, 2.70,
2.70, 2.70, 2.70, 2.70, 2.70, 2.70, 2.70, 2.70, 2.80, 2.80,
2.80, 2.80, 2.80, 2.80, 222.00, 2.80, 2.80, 2.80, 2.80, 2.80,
2.80, 2.86, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90,
2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90, 2.90,
2.90, 2.90, 2.90, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00,
3.00, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00, 3.00))
df2<-data.frame(taxon=c("Eulemur_collaris","Leopardus_colocolo",
"Leopardus_colocolo","Vicugna_vicugna","Vicugna_vicugna","Equus_quagga",
"Equus_quagga","Priodontes_maximus","Priodontes_maximus","Crocuta_crocuta"),
N=c(11,10,2,50,50,13,8,9 ,9,5),
x=c(461.0,565.0,505.0,1107.0,963.0,2046.0,2050.0,929.1,926.9,1236.0),
y=c(2150,3900,4000,36200,33200,247830,219050,31680,34690,47400))
Previously, I had been doing this using the following code, following the answer to one of my previous questions.
df3<-df%>%
mutate(N = 1) %>%
bind_rows(df2) %>%
group_by(taxon) %>%
summarise(across(c(x,y), weighted.mean, N),
N = sum(N))
This worked for quite some time. However, when I recently tried to re-run the code, I got the following error.
Error: Problem with `summarise()` input `..1`.
x object 'N' not found
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i The error occurred in group 1: taxon = "Abrocoma_bennettii".
I have been unable to figure out what produces this error. I did not change anything in the code used to run the data. I went back and loaded an older version of the database which I knew had previously successfully run the code and I got the same error where I did not before. As you can see from the dataset, this is a reproduceable error based on just this small subsample of my data. The fact that R returns this error even on datasets on which the code previously worked as well as highly simplified datasets makes me wonder if this is a bug in dplyr
rather than something to do with the syntax of the data frame. But I do not know precisely what the error is or how to rectify it.
I ran each line of code individually and it turns out it is the line summarise(across(c(x,y), weighted.mean, N), N = sum(N))
that is causing the error, but I still am unable to figure out what specifically is wrong.