1

I have 2 columns of data (x1, x2) and they have common and dissimilar data. I want to paste the the 2 columns together, separate by + sign, to a new variable (x3) so the data is in order i.e. the lowest numbered/character element always appears before the separator (+). For example this is what I want

ID    x1          x2         x3
01    x*01:02     x*01:03    x*01:02+x*01:03
02    x*01:03     x*01:02    x*01:02+x*01:03
03    x*02:01     x*08:01    x*02:01+x*08:01
04    x*08:01     x*02:01    x*02:01+x*08:01

when i run

df$x3 = paste(df$x1, df$x2, sep="+") 

x3 for IDs 01 to 01 appears as

x3
x*01:02+x*01:03
x*01:03+x*01:02
x*02:01+x*08:01
x*08:01+x*02:01
Mona
  • 93
  • 1
  • 10
  • Is this a time column? What is the criteria to use which one is lower – akrun Nov 27 '19 at 19:52
  • No its not a time column. The criteria is the lowest digit element eg. x*01:01 is less than x*01:03 – Mona Nov 27 '19 at 19:54
  • `01:01` is not numeric because of `:` Are you using the `01` after the `:` or before the `:` – akrun Nov 27 '19 at 19:59
  • I'm using 01 before the colon – Mona Nov 27 '19 at 20:11
  • If you are using 01 befor ethe colon, then `01:02` and `01:03` cannot be differentiated – akrun Nov 27 '19 at 20:15
  • Before and after the colon matter i.e.01:01 is less than 01:02, but 02:01 is greater than both 01:01 and 01:02. Also 02:10 is greater than 02:01 and so on. – Mona Nov 27 '19 at 20:20

3 Answers3

2

Based on what you describe, an alphabetical sort should work. You can do this across each row as such:

df$x3 <- apply(df[,2:3], 1, function(x) paste(sort(x), collapse = "+"))
1

We can use mixedsort

library(gtools)
apply(df[-1], 1, function(x) paste(mixedsort(x), collapse = "+"))
#[1] "x*01:02+x*01:03" "x*01:02+x*01:03" "x*02:01+x*08:01" "x*02:01+x*08:01"

data

df <- structure(list(ID = 1:4, x1 = c("x*01:02", "x*01:03", "x*02:01", 
"x*08:01"), x2 = c("x*01:03", "x*01:02", "x*08:01", "x*02:01"
)), row.names = c(NA, -4L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
0

If the structure of x1 and x2 is as you described, i.e. two sets of two digits separated by :, you could split x1 and x2 into two columns by :, then use standard numerical comparisons to align results in the order you see fit, and finally turn the output back into strings if you need to. Would that work?

kgolyaev
  • 565
  • 2
  • 10
  • I'm not sure what you mean. Basically after the paste function I need another function to order the x3 data – Mona Nov 27 '19 at 20:14