0

I need to compute the variance in a population (array) of permutations, i.e,

Let say that I have this array of permutations:

import numpy as np
import scipy.stats as stats


a = np.matrix([[1,2,3,4,5,6], [2,3,4,6,1,5], [6,3,1,2,5,4]])

# distance between a[0] and a[1]
distance = stats.kendalltau(a[0], a[1])[0]

So, how to compute (in Python) the variance on this array, i.e, how to measure how far theses permutations are from each other ?

Regards

Aymeric

p.s: I define the distance between two permutation by the kendalltau metric

ailauli69
  • 522
  • 6
  • 19
  • Can you give some example of what you mean by "how far theses permutations are from each other"? – IoaTzimas Dec 22 '20 at 12:23
  • How would you define mathematically the variance between `[1,3,4]` and `[2,5,6]` for example? – IoaTzimas Dec 22 '20 at 12:31
  • @loaTzimas Hello, I just updated my code – ailauli69 Dec 22 '20 at 12:34
  • Thanks for the update. So you want to calculate the distance for all possible pairs inside the list? – IoaTzimas Dec 22 '20 at 12:41
  • Hi @ailauli69, could you clarify what you want to compute the distance between? its not really clear? could you give an example ? – Akshay Sehgal Dec 22 '20 at 12:42
  • Hello, I mentioned how I compute the distance between 2 elements, because I thought it is useful for computing the variance. I need to compute the variance of the general population – ailauli69 Dec 22 '20 at 12:45

2 Answers2

0

I'm not sure if that's the mathematical result you are looking for. You could use stats.kendalltau to compute the distance for all possible pairs, then take the variance from that resulting vector.

To get the vector of distances, I loop through the zipped list (a, a-shifted) using np.roll:

dist = []
for x1, x2 in zip(a, np.roll(a, shift=1, axis=0)):
    dist.append(kendalltau(x1, x2)[0])

To take the variance of all distances:

np.std(dist)

Or if you are looking for the variance as enter image description here (discussed here) then take the norm of the distance vector:

np.linalg.norm(dist)

Note I'm using a as defined with np.array, not np.matrix:

a = np.array([[1,2,3,4,5,6], [2,3,4,6,1,5], [6,3,1,2,5,4]])
Ivan
  • 34,531
  • 8
  • 55
  • 100
0

I am assuming you are looking for something that broadcasts the kendalltau function over each of the 3 arrays and permutes over them. The output in that case will be a 3x3 matrix. I am not sure of what you are looking for when you say you want the variance, however. Do clarify in the comments and I'll update my answer accordingly. Hope this helps -

a = np.array([[1,2,3,4,5,6], [2,3,4,6,1,5], [6,3,1,2,5,4]])

def f(a,b):
    return np.array(stats.kendalltau(a,b)[0])

vf = np.vectorize(f, signature='(m),(m)->()')

out = vf(a[:,None,:],a[None,:,:])
print(out)
array([[ 1.        ,  0.33333333, -0.06666667],
       [ 0.33333333,  1.        , -0.46666667],
       [-0.06666667, -0.46666667,  1.        ]])

So, how to compute (in Python) the variance on this array, i.e, how to measure how far theses permutations are from each other ?

IIUC, if you are trying to calculate the kendalltau distances between each of the combinations and then check the standard deviation between the distances, you can filter our the lower triangular matrix (without diagonal) using np.tril_indices(k=-1) and then fetch the 3 values to take a np.std

np.std(out[np.tril_indices(out.shape[0], k=-1)])
0.3265986323710904
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51