I have a base vector (consisting of 1's and 0's) and I want to find the cosine distance to 50,000 other vectors (also consisting of 1's and 0's). I found many ways to calculate an entire matrix of pairwise distance, but I'm not interested in that. Rather, I'm just interested in getting the 50,000 distances of my base vector against each other vector (and then sorting to find the top 5). What's the fastest way I could achieve this?
Asked
Active
Viewed 608 times
3
-
So how did you calculate that matrix? Why don't you just calculate it for every vector? – MisterMiyagi Jul 08 '16 at 19:53
1 Answers
1
The vectorized operation is exactly the same as doing them individually, as long as you are careful with the axes. Here I have individual "other" vectors in each row:
others = numpy.random.randint(0,2,(10,10))
base = numpy.random.randint(0,2,(10,1))
d = numpy.inner(base.T, others) / (numpy.linalg.norm(others, axis=0) * numpy.linalg.norm(base))

Benjamin
- 11,560
- 13
- 70
- 119
-
So there will be 50,000 vectors 10 long... vs 1 vector 10 long. When I try this, I get alignment/broad cast issues. Can you please show how 'others' can be variable length? – AdrianBoeh May 05 '20 at 13:13