Implicit sorting in tidyr::spread and dplyr::summarise

Question

My data are ordered observations and I want to keep the ordering as much as possible while doing manipulations.

Take the answer for this question, I put "B" ahead of "A" in the dataframe. The resulting wide data are sorted by the column "name", i.e., "A" first, then "B".

df = data.frame(name=c("B","B","A","A"),
                group=c("g1","g2","g1","g2"),
                V1=c(10,40,20,30),
                V2=c(6,3,1,7))

gather(df, Var, Val, V1:V2) %>% 
unite(VarG, Var, group) %>% 
spread(VarG, Val)

  name V1_g1 V1_g2 V2_g1 V2_g2
1    A    20    30     1     7
2    B    10    40     6     3

Is there a way to keep the original ordering? like this:

  name V1_g1 V1_g2 V2_g1 V2_g2
1    B    10    40     6     3
2    A    20    30     1     7

04/02 edit: I've just found the dplyr::summarise does sorting as well. arrange(name, df$name) still works to restore the order. But I wonder if the extra sorting is necessary from the design of the packages?

df %>% 
  group_by(name) %>% 
  summarise(n()) %>% 

  name n()
1    A   2
2    B   2

Interesting. It seems like during the spread step, even the factor levels for the "name" variable gets changed.... — A5C1D2H2I1M1N2O1R2T1, Apr 01 '15 at 04:09

bergant · Accepted Answer · 2015-04-01T06:11:20.390

11

You can sort by name based on the order in the original data frame:

gather(df, Var, Val, V1:V2) %>% 
  unite(VarG, Var, group) %>% 
  spread(VarG, Val) %>%
  arrange( order(match(name, df$name)))

#   name V1_g1 V1_g2 V2_g1 V2_g2
# 1    B    10    40     6     3
# 2    A    20    30     1     7

edited Apr 01 '15 at 06:11

answered Apr 01 '15 at 06:02

bergant

7,122
1
20
24

2

Thanks. `arrange(match(name, df$name) ` also works. But how to deal with multiple grouping levels, say "name", "name1", "name2". It does not make sense to me that `spread` includes sorting by default. – Dong Apr 01 '15 at 16:13
You are right about not needing order. As for levels - you can use multiple arguments in `arrange( a, b, c, ...)` and can sort as you like (maybe I just can't see the problem). But I understand your pain as you had everything already sorted... – bergant Apr 01 '15 at 17:16
@bergant, to me, the bigger problem other than sorting is that the factor levels are being changed. – A5C1D2H2I1M1N2O1R2T1 Apr 04 '15 at 04:00
@AnandaMahto I don't see factor levels changed - levels are in both cases "A", "B". But the sorting does happen. From 2, 2, 1, 1 from original to 1, 2 in the result. I have tidyr version 0.2.0 – bergant Apr 04 '15 at 11:03
Hmm. You're right. So `arrange` would refactor? That seems like strange behavior to me. – A5C1D2H2I1M1N2O1R2T1 Apr 04 '15 at 11:41
@bergant This question is old but I still do not see the solution to Dong's problem in the first comment here, how to preserve order with >1 grouping variable? Can anyone help? – qdread Feb 04 '19 at 20:53
mutate(name = factor(name, levels=unique(df$name) ) ) %>% arrange(name) @qdread – Ferroao Feb 25 '20 at 18:23

score 10 · Answer 2 · answered Jul 06 '16 at 12:28

The order is taken from the order of the factor levels.

str(df)
'data.frame':   4 obs. of  4 variables:
 $ name : Factor w/ 2 levels "A","B": 2 2 1 1
 $ group: Factor w/ 2 levels "g1","g2": 1 2 1 2
 $ V1   : num  10 40 20 30
 $ V2   : num  6 3 1 7

See that the levels are "A","B".

So if you set the order of the levels to the order they are shown in it will work:

df = data.frame(name=c("B","B","A","A"),
                group=c("g1","g2","g1","g2"),
                V1=c(10,40,20,30),
                V2=c(6,3,1,7))

df %>% 
    mutate(name = factor(name,levels=unique(name))) %>% 
    mutate(group = factor(group,levels=unique(group))) %>% 
    gather(Var, Val, V1:V2) %>% 
    unite(VarG, Var, group) %>% 
    spread(VarG, Val)

Results in:

  name V1_g1 V1_g2 V2_g1 V2_g2
1    B    10    40     6     3
2    A    20    30     1     7

I never realized/appreciated that unique() return in the same order it was given- this will be useful for other applications as well! — emudrak, May 11 '18 at 13:29

score 2 · Answer 3 · edited Oct 03 '21 at 08:52

tidyr::pivot_wider(), the recommended replacement of tidyr::spread() since tidyr 1.0.0, keeps the rows in order so you can do :

library(tidyr)

df = data.frame(name=c("B","B","A","A"),
                group=c("g1","g2","g1","g2"),
                V1=c(10,40,20,30),
                V2=c(6,3,1,7))

pivot_wider(df, names_from = "group", values_from = c("V1", "V2"))
#> # A tibble: 2 x 5
#>   name  V1_g1 V1_g2 V2_g1 V2_g2
#>   <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 B        10    40     6     3
#> 2 A        20    30     1     7

^{Created on 2019-09-14 by the reprex package (v0.3.0)}

Implicit sorting in tidyr::spread and dplyr::summarise

3 Answers3