29

Let's say I have a string s = "bcabca".

What is the simplest way to get "aabbcc" out of it, i.e., sort the letters in s?

Orion
  • 1,104
  • 3
  • 16
  • 40
Leo
  • 1,861
  • 5
  • 19
  • 18

4 Answers4

33

Maybe not the most simple answer, but this will work:

paste(sort(unlist(strsplit(s, ""))), collapse = "")

Or modify the strReverse function that is defined in the help page for ?strsplit to suit our needs. We'll call it strSort:

strSort <- function(x)
        sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Chase
  • 67,710
  • 18
  • 144
  • 161
  • 1
    A variation using `stri_flatten` instead of `paste`: `stri_flatten(sort(unlist(strsplit(s,""))))` – kdauria Feb 16 '16 at 00:54
  • 1
    of course the first answer fails on character _vectors_. and i suspect the second will be slower than `sapply(strsplit(x, NULL), function(x) paste(sort(x), collapse = '')` (which is already slow) – MichaelChirico May 05 '18 at 00:55
18

Here's a variant of Chase's solution that handles a vector of strings and keeps the original strings as names. ...and I get a chance to promote the use of vapply over sapply :-)

> x=c('hello', 'world', NA, 'a whole sentence')
> vapply(x, function(xi) paste(sort(strsplit(xi, NULL)[[1]]), collapse=''), '')
             hello              world               <NA>   a whole sentence 
           "ehllo"            "dlorw"                 "" "  aceeeehlnnostw" 
Tommy
  • 39,997
  • 12
  • 90
  • 85
  • Yes, never use sapply when you can use vapply! – hadley May 06 '11 at 02:33
  • Reading this again two years later, there's a very slight fix to this to make it work for vectors, see my edited answer (only after submitting the edit did I read your response again and see it's all but the same! convergent evolution...) – MichaelChirico May 12 '17 at 19:13
8

It might be good to mention the stringi package for this problem. It's stri_order and stri_sort functions are very efficient, testing at half the time of the base R method mentioned above.

library(stringi)
## generate 10k random strings of 100 characters each
str <- stri_rand_strings(1e4, 100)
## helper function for vapply()
striHelper <- function(x) stri_c(x[stri_order(x)], collapse = "")
## timings
system.time({
  v1 <- vapply(stri_split_boundaries(str, type = "character"), striHelper, "")
})
#    user  system elapsed 
#   0.747   0.000   0.743 

system.time({
  v2 <- sapply(lapply(strsplit(str, NULL), sort), paste, collapse="")
})
#    user  system elapsed 
#   2.077   0.000   2.068 

identical(v1, v2)
# [1] TRUE
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
4

Revisiting this, my old answer wasn't so good. Here's a better version with base functions:

vapply(strsplit(x, NULL), function(x) paste(sort(x), collapse = ''), '')

Based off this test vector:

NN = 1000000L
starts = seq(1L, NN, by = 100L)
name = 
  substring(paste(sample(letters, size = NN, replace = TRUE), collapse = ""),
            starts, starts + 99L)
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198