2

I have several data frames, which I bound to a final containing two variables : "Label" and "Mean".

The label it is is this format:

>                                               Label       Mean
>1       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10) 18.97021 
>2       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11) 16.40476
>3       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12) 24.79132
>4       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (13) 20.95391
>5       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (14) 19.67626
>6       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (15) 28.93776

I would like to organize the data according to the number in Label, to something like this:

>                                              Label       Mean
>1       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (1) 18.97021
>2       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (2) 16.40476
>3       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (3) 24.79132
>4       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (4) 20.95391
>5       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (5) 19.67626
>6       C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (6) 28.93776

There is any advise to accomplish such thing? Thank you

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 2
    you should probably start by extracting the numbers in the brackets using Regular Expressions. Afterwards, `order` would be the way to go. – loki May 08 '18 at 14:27
  • 2
    And when you break the problem into chunks you often find that each already has a solution. F.ex. [extracting a number between brackets](https://stackoverflow.com/questions/12735503/extract-numbers-between-brackets-within-a-string) – AkselA May 08 '18 at 14:33

4 Answers4

3

Using mixedorder from gtools:

df[gtools::mixedorder(df$Label),]
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

Here a solution extracting the number inside "()" using strsplit:

Example Input data:

df<-data.frame(Label=c("C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)",
                        "C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)",
                        "C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)"),
                Mean=c(1,2,3))

df
                                           Label Mean
1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)    1
2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)    2
3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)    3

Ordering:

df[order(as.numeric(unlist(strsplit(unlist(lapply(strsplit(as.character(df$Label),split="(",fixed=T),"[",2)),split=")")))),]
                                           Label Mean
3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10)    3
2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11)    2
1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12)    1
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
1

I first create a new variable getting all numerics after the first parenthesis, excluding it. then I order the data frame

library(stringr)

df$label_id = as.numeric(str_exctract(df$label, '(?<=\\()\\d+'))
df = df[order(label_id),]
Felipe Alvarenga
  • 2,572
  • 1
  • 17
  • 36
0

Here's a dplyr approach to order by Label and mutate Label

library(magrittr)
ans <- df %>%
        dplyr::arrange(as.numeric(gsub(".*\\((\\d+)\\)$", "\\1", Label))) %>%
        dplyr::mutate(Label = paste0(gsub("(.*)\\(\\d+\\)$", "\\1", Label), "(", row_number(), ")"))

                                          # Label     Mean
# 1 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (1) 18.97021
# 2 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (2) 16.40476
# 3 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (3) 24.79132
# 4 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (4) 20.95391
# 5 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (5) 19.67626
# 6 C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (6) 28.93776

Data

df <- read.table(text="Label,Mean
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (10),18.97021 
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (11),16.40476
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (12),24.79132
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (13),20.95391
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (14),19.67626
C2-Concatenated Stacks-1:c:2/3 - MDAMB231 (15),28.93776", header=TRUE, sep=",", stringsAsFactors=FALSE)
CPak
  • 13,260
  • 3
  • 30
  • 48