Ive noticed tidyr(0.4.0)
sorts value columns when using spread
, where as tidyr(0.3.1)
returns the value columns in the order they were prior to a gather
.
Reproducible example 1:
library(dplyr)
library(tidyr)
# tidyr 0.3.1
dat<-data.frame(name=rep(c("A","B"),5),sam.id=rep(c(1,2),5),
frac=sample(c(0.05,0.1,0.2),10,replace=TRUE),
Aspecies=rnorm(10),Bspecies=rnorm(10),Zspecies=rnorm(10))
I aggregate the species values by both sam.id
and frac
(the proportion of the sample measured) i.e a multiple gather
.
dt.agg.0.3.1 <- gather(dat,key,value,-name,-sam.id) %>%
group_by(name,key) %>%
summarise(Total=sum(value)) %>% spread(key,Total) %>%
mutate(all=rowSums(.[,3:5]))
The last part of the pipe computes a simple total of all species using mutate
. So that:
head(dt.agg.0.3.1)
Source: local data frame [2 x 6]
name frac Aspecies Bspecies Zspecies all
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 A 0.85 -2.675137 -0.03287276 1.016791 -1.858010
2 B 0.40 4.194904 1.50561762 -2.738543 6.100522
Reproducible example 2:
library(tidyr)
# 0.4.0
dt.agg.0.4.0 <- gather(dat,key,value,-name,-sam.id) %>%
group_by(name,key) %>%
summarise(Total=sum(value)) %>% spread(key,Total)
head(dt.agg.0.4.0)
Source: local data frame [2 x 5]
Groups: name [2]
name Aspecies Bspecies frac Zspecies
(fctr) (dbl) (dbl) (dbl) (dbl)
1 A -2.675137 -0.03287276 0.85 1.016791
2 B 4.194904 1.50561762 0.40 -2.738543
One can see how the ordering of the value columns has been changed (alphabetically), which makes additional data-pipeline steps using mutate
for example troublesome.
dt.agg.0.4.0.mutated <- gather(dat,key,value,-name,-sam.id) %>%
group_by(name,key) %>% summarise(Total=sum(value)) %>%
spread(key,Total) %>% mutate(all=rowSums(.[,2:5]))
Throws the error;
Error: incompatible size (2), expecting 1 (the group size) or 1
Is there a way for tidyr(0.4.0)
to spread
back out in the order of the gather
?
Or will have to gather
(and summarise
) twice -- once for each key-value pair?