So the purpose is to compare each ID with each other ID by taking distances. Also some IDs may be related by belonging to the same group, this means it is not necessary to compare them if they are related.
Consider the following dataframe Df
ID AN AW Group
a white green 1
b black yellow 1
c purple gray 2
d white gray 2
The following code helps in achieving this result (from question: R Generate non repeating pairs in dataframe):
ids <- combn(unique(df$ID), 2)
data.frame(df[match(ids[1,], df$ID), ], df[match(ids[2,], df$ID), ])
#ID AN AW ID2 AN2 AW2
a white green b black yellow
a white green c purple gray
a white green d white gray
b black yellow c purple gray
b black yellow d white gray
c purple gray d white gray
I want to know if it is possible to not compute certain computations in order to have this answer:
#ID AN AW Group ID2 AN2 AW2 Group2
a white green 1 c purple gray 2
a white green 1 d white gray 2
b black yellow 1 c purple gray 2
b black yellow 1 d white gray 2
Meaning I can avoid this computations:
#ID AN AW Group ID2 AN2 AW2 Group2
a white green 1 b black yellow 1
c purple gray 2 d white gray 2
I am able to subset if I compare groups, but that means more computing time since the data frame is big, and the combinations follow n*(n-1)/2
Is this possible? Or do I have to make all combinations and then subset the comparisons between the same group out?