1

So the purpose is to compare each ID with each other ID by taking distances.

Consider the following dataframe Df

ID AN     AW
a  white  green
b  black  yellow
c  purple gray
d  white  gray

In order to compare I need a combination looking like the following:

ID   AN     AW    ID2   AN2    AW2
a  white  green   b   black  yellow
a  white  green   c   purple gray
a  white  green   d   white  gray
b   black  yellow c   purple gray 
b   black  yellow d   white  gray
c   purple gray   d   white  gray

Basically I am trying to achieve all combinations in order to take distances between the features belonging to each ID.

Here I really do not now how to begin. Any insight? Which tools from R I could use?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Saul Garcia
  • 890
  • 2
  • 9
  • 22

1 Answers1

4

One possible solution using combn and match.

ids <- combn(unique(df$ID), 2)
data.frame(df[match(ids[1,], df$ID), ], df[match(ids[2,], df$ID), ])

#     ID     AN     AW ID.1   AN.1   AW.1
# 1    a  white  green    b  black yellow
# 1.1  a  white  green    c purple   gray
# 1.2  a  white  green    d  white   gray
# 2    b  black yellow    c purple   gray
# 2.1  b  black yellow    d  white   gray
# 3    c purple   gray    d  white   gray
Raad
  • 2,675
  • 1
  • 13
  • 26
  • Simpler: `data.frame(Df[combn(Df$ID, 2)[1,],], Df[combn(Df$ID, 2)[2,],])` – alistaire Mar 16 '16 at 23:51
  • Only works if the levels of the factor are in the correct order corresponding to the right row. E.g. if we add `droplevels(ids[,2])` you do not get the results you want – Raad Mar 16 '16 at 23:56
  • Thank you guys, now I've got some new tools in my pocket! – Saul Garcia Mar 17 '16 at 06:53
  • @NBATrends this works perfectly in this situation, but as I tried to implement it on a big dataframe `2328439 signatures of 11 variables`, then I get this error. `Error in combn(unique(signatures$uniqueid), 2) : n < m`. Any ideas? – Saul Garcia Mar 18 '16 at 10:55
  • Not sure, it might be worth opening a new question – Raad Mar 18 '16 at 11:26
  • It would help to see a sample of your data as well. – Raad Mar 18 '16 at 11:30