I apologize this is my attempt at redeeming myself after a disastrous earlier attempt . Now I have a bit more clarity. So here I go again.
My goal is to find rows that are similar. So first I am interested in calculating the distance between rows. This is a test dataset below.
Row Blood x1 x2 x3 x4
1 A 0.01 0.16 0.31 0.46
2 A 0.02 0.17 0.32 0.47
3 A 0.03 0.18 0.33 0.48
4 B 0.05 0.20 0.35 0.49
5 B 0.06 0.21 0.36 0.50
6 B 0.07 0.22 0.37 0.51
7 AB 0.09 0.24 0.39 0.52
8 AB 0.1 0.25 0.4 0.53
9 AB 0.11 0.26 0.41 0.54
10 O 0.13 0.28 0.43 0.55
11 O 0.14 0.29 0.44 0.56
12 O 0.15 0.3 0.45 0.57
There are two things here 1) Distance 2) Rows
Consider this row combination.
For Row(1-4-7-10) , distance D = (d1,4 + d1,7 + d1,10 + d4,7 + d4,10 + d7,10)/6
{ Row1-Blood A, Row1-Blood B, Row1- Blood AB, Row1- Blood O }
Distance between Row{1,4,7,10} is calculated based on this concept
d1,4 = Distance between : Row1-Blood A, Row1-Blood B
d1,7 = Distance between : Row1-Blood A, Row1-Blood AB
d1,10 = Distance between : Row1-Blood A, Row1-Blood O
d4,7 = Distance between : Row1-Blood B, Row1-Blood AB
d4,10 = Distance between : Row1-Blood B, Row1-Blood O
d7,10 = Distance between : Row1-Blood AB, Row1-Blood O
d-1-4 = (0.01-0.05)^2 + (0.16-0.20)^2 + (0.31-0.35)^2 + (0.46-0.49)^2
d-1-7 = (0.01-0.09)^2 + (0.16-0.24)^2 + (0.31-0.39)^2 + (0.46-0.52)^2
d-1-10 = (0.01-0.13)^2 + (0.16-0.28)^2 + (0.31-0.43)^2 + (0.46-0.55)^2
d-4-7 = (0.05-0.09)^2 + (0.20-0.24)^2 + (0.35-0.39)^2 + (0.49-0.52)^2
d-4-10 = (0.05-0.13)^2 + (0.20-0.28)^2 + (0.35-0.43)^2 + (0.49-0.55)^2
d-7-10 = (0.09-0.13)^2 + (0.24-0.30)^2 + (0.39-0.43)^2 + (0.52-0.55)^2
Similarly I am interested in calculating the distances between 81 different row combinations (3*3*3*3).
The final expected dataset should look like this below.
Row Distance
1-4-7-10
1-4-7-11
1-4-7-12
1-4-8-10
1-4-8-11
1-4-8-12
1-4-9-10
1-4-9-11
1-4-9-12
1-5-7-10
1-5-7-11
1-5-7-12
1-5-8-10
1-5-8-11
1-5-8-12
1-5-9-10
1-5-9-11
1-5-9-12
1-6-7-10
.
.
.
3-6-9-12
I know I can do this with 4 nested loops and lists. I am wondering if there is a more efficient way to accomplish this.