I have a pandas dataframe similar to the one below:
Output var1 var2 var3
1 0.487981 0.297929 0.214090
1 0.945660 0.031666 0.022674
2 0.119845 0.828661 0.051495
2 0.095186 0.852232 0.052582
3 0.059520 0.053307 0.887173
3 0.091049 0.342226 0.566725
3 0.119295 0.414376 0.466329
... ... ... ... ...
Basically, I have 3 columns (propensity score values) and one output (treatment). I want to calculate the within-trio distance to find trios of outputs with the smallest within-trio distance. The experiment is taken from the paper: "Matching by Propensity Score in Cohort Studies with Three Treatment Groups", Rassen et al. Looking at their explanation is like calculating the perimeter of a triangle, but I am not sure. I think that at this GitHub link: https://github.com/bwh-dope/pharmacoepi_toolbox/blob/master/src/org/drugepi/match/MatchDistanceCalculator.java there is Java code doing this stuff more or less, but I am not sure on how to use it. I use Python, so I have two options: try to adapt this previous code or write something else. My idea is that var1, var2 and var3 can be considered like spatial x,y,z coordinates, and the output is like a point in the space. I found a function that calculates the distance between 2 points:
#found here https://stackoverflow.com/questions/68938033/min-distance-between-point-cloud-xyz-points-in-python
import numpy as np
distance = lambda p1, p2: np.sqrt(np.sum((p1 - p2) ** 2, axis=0))
import itertools
def min_distance(cloud):
pairs = itertools.combinations(cloud, 2)
return np.min(map(lambda pair: distance(*pair), pairs))
def get_points(filename):
with open(filename, 'r') as file:
rows = np.genfromtxt(file, delimiter=',', skip_header=True)
return rows
filename = 'cloud.csv'
cloud = get_points(filename)
min_dist = min_distance(cloud)
However, I want to calculate the distance between 3 points, so I think that I need to iterate all the possible combinations of 3 points like XY, XZ and YZ, but I am not sure of this procedure.