Unable to calculate mahalanobis distance

Question

import numpy as np
from scipy.spatial import distance

d1 = np.random.randint(0, 255, size=(50))*0.9
d2 = np.random.randint(0, 255, size=(50))*0.7

vi = np.linalg.inv(np.cov(d1,d2, rowvar=0))   
res = distance.mahalanobis(d1,d2,vi)

print res

ValueError: shapes (50,) and (2,2) not aligned: 50 (dim 0) != 2 (dim 0)

What would be the output array shape, i.e. shape of `res`? Also, can you hand calculate the expected output for a very small, let's say for `d1` and `d2` as `3` elements each case? — Divakar, Oct 25 '15 at 19:16
If I am not mistaken, `vi` should be an estimate of the precision matrix of all your observations. `np.cov(d1, d2)` is probably not what you want. — cel, Oct 25 '15 at 19:30
@cel doc says its inverse of covariance matrix http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html#scipy.spatial.distance.mahalanobis — Roman, Oct 25 '15 at 19:39
You may want to check wikipedia to understand what exactly is measured by this distance. — cel, Oct 25 '15 at 19:40
It's the wrong matrix, check out this answer http://stackoverflow.com/a/15068615/4016674 You need something like `np.linalg.inv(np.cov(np.vstack((d1, d2)).T))` — hellpanderr, Oct 25 '15 at 19:45
@hellpanderrr It worked, but in other cases, when i inverted the matrix i got LinAlgError: Singular matrix. The matrix is not invertible, what to do in this case? — Roman, Oct 25 '15 at 19:53
I would argue if you cannot invert the matrix, there is something else wrong going on. If you insist on it being inverted, you can use pseudo-inversion (ie `np.linalg.pinv`) — Julien, Oct 25 '15 at 19:55
@Julien overall the program did not failed, but the result not correct.. did not know how to use vi correctly... — Roman, Oct 25 '15 at 20:05
Just passing by.... Hey you should use upgrade to python 3, won't solve your problem still... — Julien Palard, Oct 25 '15 at 22:05
@JulienPalard thanks, i shall try. but why its different to py2.7... — Roman, Oct 26 '15 at 00:07
@jean "Short version: Python 2.x is legacy, Python 3.x is the present and future of the language" https://wiki.python.org/moin/Python2orPython3 — Julien Palard, Oct 26 '15 at 06:25

score 1 · Answer 1 · answered Oct 27 '15 at 13:47

The Mahalanobis distance computes the distance between two D-dimensional vectors in reference to a D x D covariance matrix, which in some senses "defines the space" in which the distance is calculated. The matrix encodes how various combinations of coordinates should be weighted in computing the distance.

It seems that you've computed the 2x2 sample covariance for your points, which is not the right type of covariance matrix to use in a mahalanobis distance.

If you don't already have a well-justified 50x50 covariance matrix which defines your mahalanobis metric, the mahalanobis distance is probably not the right choice for your application. Without more detail it's hard to give a better recommendation.

score 0 · Answer 2 · answered May 23 '19 at 12:51

As mentioned in jakevdp's answer, your inverse covariance matrix must be of DxD dimensions, where D is the number of elements in your vectors. So, your code should be:

import numpy as np
from scipy.spatial import distance

d1 = np.random.randint(0, 255, size=(50))*0.9
d2 = np.random.randint(0, 255, size=(50))*0.7
m =zip(d1, d2)
v = np.cov(m)
try:
    vi = np.linalg.inv(v)
except:
    vi = np.linalg.pinv(v) #just in case the produced matrix cannot be inverted

res = distance.mahalanobis(d1,d2,vi)

print res

Unable to calculate mahalanobis distance

2 Answers2