If we have two lists of strings:
A = "Hello how are you? The weather is fine. I'd like to go for a walk.".split()
B = "bank, weather, sun, moon, fun, hi".split(",")
The words in list A
constitute my word vector basis.
How can I calculate the cosine similarity scores of each word in B?
What I've done so far: I can calculate the cosine similarity of two whole lists with the following function:
def counter_cosine_similarity(c1, c2):
terms = set(c1).union(c2)
dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
return dotprod / (magA * magB)
But how do I have to integrate my vector basis and how can I calculate then the similarities between the terms in B?