3

I am dealing with some factors that are coded as , a-b, and some are coded as , b-a. Both are save for my usecase. How do I change such factors to a-b so that they are consistent.

I could do this using an if-else statement but I am wondering if there is an efficient way of doing this.

From

 Id        Col1
 101       a-b-c-d
 102       a-c-d
 103       a-b
 104       a-b
 105       b-a
 106       b-a
 107       a-c-b

Expected Results

 Id        Col1
 101       a-b-c-d
 102       a-c-d
 103       a-b
 104       a-b
 105       a-b
 106       a-b
 107       a-c-b
bison2178
  • 747
  • 1
  • 8
  • 22

3 Answers3

1

We can use separate_rows to split the 'Col1' and then paste it together after sorting

library(dplyr)
library(tidyr)
df1 %>% 
   separate_rows(Col1) %>% 
   group_by(Id) %>% 
   summarise(Col1 = paste(sort(Col1), collapse='-'))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

The code might be a little less readable, but uses base functions

df1$Col2 <- factor(sapply(as.character(df1$Col1), function(x) paste(sort(unlist(strsplit(x, "-"))), collapse = "-")))

> df1
   Id    Col1    Col2
1 101 a-b-c-d a-b-c-d
2 102   a-c-d   a-c-d
3 103     a-b     a-b
4 104     a-b     a-b
5 105     b-a     a-b
6 106     b-a     a-b
7 107   a-c-b   a-b-c
manotheshark
  • 4,297
  • 17
  • 30
1

You can relevel Col1 in-place with levels<-:

df <- data.frame(Id = 101:107, 
                 Col1 = c("a-b-c-d", "a-c-d", "a-b", "a-b", "b-a", "b-a", "a-c-b"))

levels(df$Col1) <- sapply(strsplit(levels(df$Col1), '-'), 
                          function(x) paste(sort(x), collapse = '-'))

df
#>    Id    Col1
#> 1 101 a-b-c-d
#> 2 102   a-c-d
#> 3 103     a-b
#> 4 104     a-b
#> 5 105     a-b
#> 6 106     a-b
#> 7 107   a-b-c

Or use forcats::fct_relabel or lvls_revalue:

df <- data.frame(Id = 101:107, 
                 Col1 = c("a-b-c-d", "a-c-d", "a-b", "a-b", "b-a", "b-a", "a-c-b"))

forcats::fct_relabel(df$Col1, 
                     function(levs){
                         sapply(strsplit(levs, '-'), 
                                function(lev) paste(sort(lev), collapse = '-'))
                     })
#> [1] a-b-c-d a-c-d   a-b     a-b     a-b     a-b     a-b-c  
#> Levels: a-b a-b-c-d a-b-c a-c-d

forcats::lvls_revalue(df$Col1, 
                      sapply(strsplit(levels(df$Col1), '-'), 
                             function(x){paste(sort(x), collapse = '-')}))
#> [1] a-b-c-d a-c-d   a-b     a-b     a-b     a-b     a-b-c  
#> Levels: a-b a-b-c-d a-b-c a-c-d
alistaire
  • 42,459
  • 4
  • 77
  • 117