0

I have data like,

df <- structure(list(Sex = structure(c(1L, 1L, 2L, 1L, 2L, 2L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), 
    Age = c(19L, 16L, 16L, 13L, 16L, 30L, 16L, 30L, 16L, 30L, 
    30L, 16L, 19L, 1L, 30L), I = c(1, 1, 0, 0, 1, 0, 1, 0, 1, 
    0, 0, 0, 1, 0, 1), E = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 
    1, 0, 1, 0), S = c(1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 
    0, 1), N = c(0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0), 
    F = c(1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1), T = c(0, 
    1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0), C = c(1, 1, 1, 
    0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1), D = c(0, 0, 0, 1, 0, 
    1, 0, 1, 0, 1, 1, 1, 1, 0, 0), type = c("CIFS", "CITN", "CESF", 
    "DEFS", "CIFN", "DETS", "CITS", "DEFS", "CIFN", "DEFN", "DETS", 
    "DETS", "DINF", "CENT", "CIFS"), PO = runif(15, -3, 3), AO = runif(15, -3, 3)), .Names = c("Sex", 
"Age", "I", "E", "S", "N", "F", "T", "C", "D", "type", "PO", 
"AO"), class = c("tbl_dt", "tbl", "data.table", "data.frame"), row.names = c(NA, 
-15L))

I want to sort the column type. Not the column but the characters in it. And get the same structure afterwards. For example, CIFS should then be CFIS. I tried to do it as,

df <- within(df, {
    type <- apply(sapply(strsplit(df[, type], split=''), sort), 2, 
        function(x) paste0(x, collapse = ''))
})

Is there any simpler solution, that I have missed to find.

TheRimalaya
  • 4,232
  • 2
  • 31
  • 37
  • 2
    I think you've probably got the canonical base-R method. [This question](http://stackoverflow.com/questions/13612967/how-to-reverse-a-string-in-r) give some other alternatives. – Ben Bolker Apr 08 '16 at 12:27
  • I don't understand why you brought up data.frame either. Your question is about sorting string, why not to simplify it – RInatM Apr 08 '16 at 12:30

2 Answers2

4

Since you are using data.table, I would suggest

df[, type := paste(sort(unlist(strsplit(type, ""))), collapse = ""), by = type]

like described in How to sort letters in a string?

Community
  • 1
  • 1
RInatM
  • 1,208
  • 1
  • 17
  • 39
  • I think, there is a problem, since type can be same for different rows, this will reduce the dimension. I think we need to use `*apply` like, `df[, type := sapply(type, function(x) paste(sort(unlist(strsplit(x, ''))), collapse = ''))]` – TheRimalaya Apr 08 '16 at 13:07
  • 1
    no, data.table takes care of this. the function is called once for each unique value of type – RInatM Apr 08 '16 at 13:36
3

This should work for both data.frame and data.table (base R only):

df$type <- vapply(strsplit(df$type, split=''),FUN=function(x)paste(sort(x),collapse=''),"")

Result:

> df
   Sex Age I E S N F T C D type         PO         AO
1    F  19 1 0 1 0 1 0 1 0 CFIS  2.9750666  2.0308410
2    F  16 1 0 0 1 0 1 1 0 CINT  0.7902187  2.0891158
3    M  16 0 1 1 0 1 0 1 0 CEFS -1.7173785  2.4774140
4    F  13 0 1 1 0 1 0 0 1 DEFS  1.5352127 -1.9272470
5    M  16 1 0 0 1 1 0 1 0 CFIN -0.2160741  1.7359897
6    M  30 0 1 1 0 0 1 0 1 DEST  2.6314981 -0.6252466
7    F  16 1 0 1 0 0 1 1 0 CIST -1.6032894 -1.9938226
8    M  30 0 1 1 0 1 0 0 1 DEFS  0.7748583 -2.0935737
9    F  16 1 0 0 1 1 0 1 0 CFIN -2.9368356  0.3363364
10   F  30 0 1 0 1 1 0 0 1 DEFN -0.6506217  2.6681535
11   F  30 0 1 1 0 0 1 0 1 DEST -0.4432578  0.4627441
12   F  16 0 1 1 0 0 1 0 1 DEST  2.0236760  2.7684298
13   F  19 1 0 0 1 1 0 0 1 DFIN -1.1774931  2.6546726
14   F   1 0 1 0 1 0 1 1 0 CENT -2.2365388  2.7902646
15   F  30 1 0 1 0 1 0 1 0 CFIS -1.6139238 -2.4982620
digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 1
    Depending on the data, I guess, it might be beneficial to `sort/paste` only the `unique` values of `df$type` and fill the result with subsetting; `lvs = unique(df$type); vapply(strsplit(lvs, "", fixed = TRUE), function(x) paste(sort(x), collapse = ""), character(1))[match(df$type, lvs)]`. (And if `df$type` is a, already, a "factor" it could save the `match` and `unique` calls with `as.integer(df$type)` and `levels(df$type)` respectively). – alexis_laz Apr 08 '16 at 13:25
  • @alexis_laz: yes, you're right, but it depends how many rows we're talking about... – digEmAll Apr 08 '16 at 14:57