For the purposes of keeping it simple I have four vectors -- W, X, Y, Z -- that contain a number of values (each the same length). I'm trying to calculate cosine similarity across them pairwise in Python, but I can't seem to get the right answer.
If I try comparing W vs. X:
print(np.dot(W, X.T)/(np.linalg.norm(W)*np.linalg.norm(X)))
I get the following result:
[[0.9984622004973391]]
If I compare W vs. Y I get:
[[0.8891911653057049]]
And if I compare W to Z I get:
[[0.9676746591879851]]
I of course don't want to do these manually one by one, however, as I have many vectors in reality.
When I try to calculate all three (X, Y, Z) vs. W at once:
V = pd.concat([X, Y, Z])
print(np.dot(W, V.T)/(np.linalg.norm(W)*np.linalg.norm(V)))
I get the following:
[[0.9982175434442747 0.005561082504669956 0.020547860729214433]]
...where the first nearly matches what I had gotten running them singularly (but still not quite), while the others are way off.
I must have an issue with my approach to the all at once version, but I have not been able to figure out how to fix it. Any ideas? Thanks!