Cosine distance more than 1

Question

I'm using the distance.cosine function from the scipy.spatial python package. The problem is that my code returns me some values which are more than one. How is that possible?

My code is very simple but that's it:

for i in range(len(vec.split(","))):
    w1=vec.split(",")[i]
    vec_1=embedding.get_phrase_vector(w1)/np.linalg.norm(embedding.get_phrase_vector(w1))
        for j in range(len(vec.split(","))):
            w2=vec.split(",")[j]
            vec_2=embedding.get_phrase_vector(w2)/np.linalg.norm(embedding.get_phrase_vector(w2))
            matrix[i][j]=distance.cosine(vec_1,vec_2)

the two vector giving me problems are:

w1=[-0.29137    1.0635    -0.41772    0.10439    0.46724    0.28249
 -0.04234   -0.07716    0.31482   -0.31903   -0.15905    0.98593
  0.40408   -0.33376    0.11372    0.3485     0.28884    0.082693
  0.86843   -0.40946   -0.64101   -0.55062    0.15105   -0.16613
  0.88421    0.31586    0.0017234 -0.46789   -0.48933   -0.38975
 -0.48061   -0.086691   0.96367    0.13027    0.10883    0.13111
 -0.28605    0.32731    0.10249   -0.50631   -0.27578    0.053391
  0.45665   -0.11782    0.039271   0.27073    0.46305    0.66542
 -0.41682   -0.14791   -0.9136    -0.71694   -0.11963    0.095209
  0.21016    0.67604   -0.23403   -0.39308    0.34853   -0.91753
  0.73017    0.79334   -0.25474    0.51577   -1.0458    -0.59653
 -0.54101   -0.056912   0.01262    0.046881   0.0708     0.20313
 -0.34206   -0.62316   -0.48464    0.013741   0.057855  -0.29289
 -0.1755     0.059357  -0.01446    0.17238    0.065214   0.4437
  0.38186   -0.21588    0.55824    0.099175  -0.0094545  0.82726
 -0.4048    -0.47035   -0.16345    0.080469  -0.048781   0.091551
  0.67828   -0.56955   -0.024643  -0.51526  ]
w2=[-1.6486e-01  9.1997e-01  2.2737e-01 -4.9031e-01 -1.8082e-03 -3.3803e-01
  5.7221e-02  1.4601e-01  4.0202e-01 -2.8858e-01 -4.7495e-01 -5.6369e-01
  2.7037e-01  5.1702e-01 -1.1241e-01  1.8314e-01  2.2066e-01 -4.8606e-01
 -8.7284e-01 -6.2587e-02  4.3016e-02  2.3641e-01  5.9705e-01 -3.8640e-01
 -2.5194e-01  9.6862e-01 -4.3112e-01 -4.8370e-01 -1.1396e+00  9.2425e-02
 -1.1476e-01 -7.4291e-02 -6.2524e-02 -9.5122e-02 -2.2714e-01  8.8291e-01
  3.9978e-01  7.6631e-01 -6.7697e-01 -6.2829e-01 -1.1872e-01 -2.4492e-01
 -5.8893e-01 -8.5088e-01  1.1107e+00  4.2190e-01 -1.5072e+00 -1.9509e-01
 -2.6712e-01 -7.0801e-01  5.5075e-01 -4.6929e-02 -2.5203e-01  7.4411e-01
 -1.8325e-01 -1.4885e+00 -4.6393e-01 -1.0338e-01  2.3525e+00 -1.5421e-01
  3.9833e-01  1.5344e-02  8.0708e-02 -2.7373e-01  9.7057e-01 -1.9383e-02
  2.0899e-01 -6.4033e-01  9.2509e-01 -4.5371e-01 -7.0564e-01 -1.6033e-01
 -7.1761e-02  6.2856e-01  3.5732e-01  8.8802e-01 -6.9127e-01  4.9634e-02
 -9.3347e-01  6.5396e-01  3.7165e-01  5.8363e-02 -1.0152e+00  7.0845e-01
 -1.3542e+00 -3.6390e-01  2.5994e-01 -1.8260e-01 -9.8930e-01 -4.4699e-01
  8.5016e-01  9.4532e-02  3.7019e-01 -5.0354e-01 -1.2083e+00 -3.5776e-01
  2.3899e-01 -6.7904e-02  1.5072e+00  6.0889e-01]

and their disctance results 1.08074426763993081

For what vectors is this returning an invalid result? Please see [mre]. — Kraigolas, Jun 26 '21 at 13:37

perl · Accepted Answer · 2021-06-26T14:05:20.990

4

If dot product of these vectors is negative, it's perfectly OK for cosine to return a value greater than 1 (see the formula used for cosine in the documentation)

For example:

from scipy.spatial.distance import cosine

cosine([1], [-1])

Output:

2.0

edited Jun 26 '21 at 14:05

answered Jun 26 '21 at 13:57

perl

9,826
1
10
22

Cosine distance more than 1

1 Answers1