0

I would like your experise on this example. I need to have all combinations of two vectors and remove if they are same and remove one copy if they are duplicated.

v1 <- c("AS", "KS", "AZ", "AL", "MO")
v2 <- c("AZ", "KZ", "LM", "AZ", "ZK")

I woule like to get combinations of v1 / v2 and I don't want reciprocal V2 / V1, so I use

z<-outer(v1,v2, paste, sep="/") 

which gives me

      [,1]    [,2]    [,3]    [,4]    [,5]   
[1,] "AS/AZ" "AS/KZ" "AS/LM" "AS/AZ" "AS/ZK"
[2,] "KS/AZ" "KS/KZ" "KS/LM" "KS/AZ" "KS/ZK"
[3,] "AZ/AZ" "AZ/KZ" "AZ/LM" "AZ/AZ" "AZ/ZK"
[4,] "AL/AZ" "AL/KZ" "AL/LM" "AL/AZ" "AL/ZK"
[5,] "MO/AZ" "MO/KZ" "MO/LM" "MO/AZ" "MO/ZK"

But I need to modify to fit it in my analysis

Step 1. Remove same combination. I don't need to have the combinations which has same. In the above example there are two times AZ/AZ and both should be removed.

Step 2. Remove duplicated combinations. I don't need duplications. In the above example AL/AZ, AS/AZ, KS/AZ, MO/AZ are duplicated. One copy should be removed.

Step 3. Remove receprocal combinations if any. For instance AZ/AS is the same as AS/AZ.

Step 3. Sort all and keep them in single column.

"AL/AZ"
"AL/KZ"
"AL/LM"
"AL/ZK"
"AS/AZ"
"AS/KZ"
"AS/LM"
"AS/ZK"
"AZ/KZ"
"AZ/LM"
"AZ/ZK"
"KS/AZ"
"KS/KZ"
"KS/LM"
"KS/ZK"
"MO/AZ"
"MO/KZ"
"MO/LM"
"MO/ZK"

Thanks

  • Try `matrix(sort(unique(as.vector(z))), ncol=1)`, or just `sort(unique(as.vector(z)))`, depending on what you're really needing. – Josh O'Brien Jun 12 '14 at 22:47
  • Does the order of the two values matter in the final result? – MrFlick Jun 12 '14 at 22:57
  • Don't you think it's rather unfortunate that you did not construct a test case that had any AS/AZ, AZ/AS combinations? And can you expalin why "AL/AZ, AS/AZ, KS/AZ, MO/AZ are duplicated" in your example? – IRTFM Jun 13 '14 at 00:00
  • Mr.Flick: The sorted final result helps ! – user3543621 Jun 13 '14 at 03:42
  • BondedDust: Yes, in the example I didn't include a test case. Sorry. AL/AZ, AS/AZ, KS/AZ, MO/AZ are duplicated as 'AZ' is there in both V1 and v2 – user3543621 Jun 13 '14 at 03:44
  • Josh O'Brien: Thanks for the simple soultion. But it doen't remove AZ/AZ – user3543621 Jun 13 '14 at 13:20

2 Answers2

3

If the order of the two values doesn't matter in the final result, then this should work

v1 <- c("AS", "KS", "AZ", "AL", "MO")
v2 <- c("AZ", "KZ", "LM", "AZ", "ZK")
vv <- sort(unique(c(v1,v2)))

f1 <- as.numeric(factor(v1, levels=vv))
f2 <- as.numeric(factor(v2, levels=vv))
ff <- expand.grid(f1, f2)
ok <- unique(t(apply(subset(ff, Var1 != Var2), 1, sort)))

comb <- paste(vv[ok[,1]], vv[ok[,2]],sep="/")

which produces

 [1] "AS/AZ" "AZ/KS" "AL/AZ" "AZ/MO" "AS/KZ" "KS/KZ" "AZ/KZ" "AL/KZ" "KZ/MO"
[10] "AS/LM" "KS/LM" "AZ/LM" "AL/LM" "LM/MO" "AS/ZK" "KS/ZK" "AZ/ZK" "AL/ZK"
[19] "MO/ZK"
MrFlick
  • 195,160
  • 17
  • 277
  • 295
1

Here's another possible strategy using the igraph library.

library(igraph)
v1 <- c("AS", "KS", "AZ", "AL", "MO")
v2 <- c("AZ", "KZ", "LM", "AZ", "ZK")
gg<-graph.data.frame(expand.grid(v1,v2), directed=F)
ss<-simplify(gg)
apply(get.edgelist(ss),1, paste, collapse="/")

Basically we use all the logic if this graph library to define the values you want as nodes and then make all the connections between the two sets. Using simplify removes the nodes that are connected to themselves and also removes redundant connections between nodes. Perhaps a bit unorthodox to use the package like this, but as you can see it's relatively straight forward. Output:

 [1] "AS/AZ" "AS/KZ" "AS/LM" "AS/ZK" "KS/AZ" "KS/KZ" "KS/LM"
 [8] "KS/ZK" "AZ/AL" "AZ/MO" "AZ/KZ" "AZ/LM" "AZ/ZK" "AL/KZ"
[15] "AL/LM" "AL/ZK" "MO/KZ" "MO/LM" "MO/ZK"
MrFlick
  • 195,160
  • 17
  • 277
  • 295