1

I am using R and have a data frame which has four columns. One of them is numeric, and another one has lists in each row:

column_1    column_2    column_3    column_4
a           x           1           c("334", "538", "645", "763")
b           y           2           c("762", "838")
c           z           3           c("78", "85", "529", "797", "859")
d           p           3           c("8", "75", "242", "766")
e           q           4           c("85", "447", "529", "797", "859")

I want to order by data frame by column_3, and in case of a tie, in lexicographcal ascending order of column_4.

The example above is how my data frame looks after I sort my data frame using column_3 and column_4 as follows:

df <- df[order(df$column_3, df$column_4),]

However, my expected output is that row 2 should appear after row 3 since 78 is smaller than 762. Is it possible to do that?

Arun
  • 116,683
  • 26
  • 284
  • 387
Abhinav Sood
  • 799
  • 6
  • 23
  • In the column_4, you have character values as a `list`. Also, I think you need `lapply` as it is a `list` though it is not clear about the order for column_4 as it has multiple elements. – akrun Sep 27 '15 at 03:54
  • You should provide the input dataset instead of the modified one. Perhaps `df[order(df$column_3, as.numeric(sapply(df$column_4, '[', 1))),]` – akrun Sep 27 '15 at 04:12
  • 1
    Post the result of `dput(df)` in your question. – jlhoward Sep 27 '15 at 04:32
  • IMHO: rows 3 and 4 should be changed. Not 2 and 3, als 3 and 4 do have the same `column_3==3` but in `culumn_4[1]` 8 < 78 – Rentrop Sep 27 '15 at 07:47
  • @akrun: This only works if `column_4[1]` has unique vales. Isn't a lexicographical order using `column_4[2]` for duplicates in `column_4[1]`, `column_4[3]` for duplicates in `column_4[2]` and so on? – Rentrop Sep 27 '15 at 07:52
  • 1
    @Floo0 The OP didn't provide the input dataset. So, I was just guessing. – akrun Sep 27 '15 at 08:04

1 Answers1

2

Assuming you want to have numeric-values instead of characters in column_4 you can do the following: (with data.table)

Your Data (assumption as there is no dput)

dat <- read.table(header = TRUE, text = "
column_1    column_2    column_3    
a           x           1           
b           y           2           
c           z           3           
d           p           3           
e           q           4")           

column4 <- list(
  c("334", "538", "645", "763"),
  c("762", "838"),
  c("78", "85", "529", "797", "859"),
  c("8", "75", "242", "766"),
  c("85", "447", "529", "797", "859")
)

Combining the two data-steams

require(data.table)

column4 <- lapply(column4, as.numeric)
dat <- as.data.table(c(dat, column_4 = list(column4)))

Gives you

   column_1 column_2 column_3            column_4
1:        a        x        1     334,538,645,763
2:        b        y        2             762,838
3:        c        z        3  78, 85,529,797,859
4:        d        p        3       8, 75,242,766
5:        e        q        4  85,447,529,797,859

So this is what i assume is your data-set. To do the ordering not based in column_3 follow by column_4 in a lexicographical ascending order works as follows

t.data.table <- function(x) {as.data.table(t(x))}
lex_order <- rbindlist(lapply(dat[,column_4], t.data.table), fill=TRUE)
order_base <- c(dat[,list(column_3)], lex_order)
dat[do.call(order, order_base)]

Have a look at order_base to see what happened there.

The result looks as follows:

   column_1 column_2 column_3            column_4
1:        a        x        1     334,538,645,763
2:        b        y        2             762,838
3:        d        p        3       8, 75,242,766
4:        c        z        3  78, 85,529,797,859
5:        e        q        4  85,447,529,797,859

This is the same approach/logic as i used for this question: How to sort a list by byte-order for AWS-Calls

Community
  • 1
  • 1
Rentrop
  • 20,979
  • 10
  • 72
  • 100