2

I have, what I think is a very simple question but can't figure it out or find the exact problem online. I want to order my dataset by id and time 1:4 so that it is in the sequence 1,2,3,4 not 1,1,1,2,2,2,3,4. See example:

dff <- data.frame (id=c(1,1,1,1,1,1,1,1,2,2,2,3),
                      time=c(1,1,2,2,3,3,4,4,1,1,2,1))
    R>dff
       id time
    1   1    1
    2   1    1
    3   1    2
    4   1    2
    5   1    3
    6   1    3
    7   1    4
    8   1    4
    9   2    1
    10  2    1
    11  2    2
    12  3    1

I want the resulting dataset to be ordered as follows:

    R>dff
   id time
1   1    1
2   1    2
3   1    3
4   1    4
5   1    1
6   1    2
7   1    3
8   1    4
9   2    1
10  2    2
11  2    1
12  3    1

I would preferably like to use arrange function in dplyr but will take any solution. I believe I should be creating a vector v<-c(1,2,3,4) and ordering with this using %in% but I'm not sure how. Something like this would i think just order 1,1,1 which is not what I want. Any help appreciated, thanks.

user63230
  • 4,095
  • 21
  • 43

2 Answers2

5

We can create a sequence column grouped by 'id', 'time', then do the arrange based on the 'ind' and remove the column afterwards with select

library(dplyr)
dff %>%
    group_by(id, time) %>% 
    mutate(ind = row_number()) %>%
    arrange(id, ind) %>%
    select(-ind)
#     id  time
#   <dbl> <dbl>
#1      1     1
#2      1     2
#3      1     3
#4      1     4
#5      1     1
#6      1     2
#7      1     3
#8      1     4
#9      2     1
#10     2     2
#11     2     1
#12     3     1

If we are using base R, the following one-liner would serve the purpose

dff[order(dff$id, with(dff, ave(time, id, time, FUN = seq_along))),]
#   id time
#1   1    1
#3   1    2
#5   1    3
#7   1    4
#2   1    1
#4   1    2
#6   1    3
#8   1    4
#9   2    1
#11  2    2
#10  2    1
#12  3    1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    when i run your code it give me the same output as original dff dataframe – Arun kumar mahesh Jul 18 '16 at 11:57
  • @akrun perfect answer. However can you think of a reason why it wont run on my pc? Restarted Rstudio and reinstalled dplyr and the resulting dataframe is the same?? – user63230 Jul 18 '16 at 11:58
  • @Arunkumarmahesh I am getting the expected output as is showed – akrun Jul 18 '16 at 12:02
  • @user63230 I am running it on R console 3.3.0 and it works well for me. You have to assign the output to a new object or to the same object to reflect the changes, i.e. `dff <- dff[order(..` – akrun Jul 18 '16 at 12:05
  • @Arunkumarmahesh First I thought that it might be some issue with the dplyr version (I am using `dplyr_0.5.0`), but then I am also getting the same output with `base R`. So something is wrong on your side. Also, have you checked with `dplyr::mutate(ind = row_number())` in case you have also loaded `plyr` – akrun Jul 18 '16 at 12:07
  • @akrun its works fine for base R but i had an issue with dplyr 0.4.3 i must use ungroup as below OP – Arun kumar mahesh Jul 18 '16 at 12:31
4

A slight build on @akrun answer. Using dplyr version 0.4.3 I think ungroup() needs to be used before arranging it - Since its grouped by id & time. Seems like its sorted on the level of the group first & then the columns specified in arrange.

library(dplyr)
dff %>%
    group_by(id, time) %>% 
    mutate(ind = row_number()) %>%
    ungroup() %>%
    arrange(id, ind) %>%
    select(-ind)
Krupa Kapadia
  • 469
  • 4
  • 11