Calculate cosine similarity between words

Question

If we have two lists of strings:

A = "Hello how are you? The weather is fine. I'd like to go for a walk.".split()
B = "bank, weather, sun, moon, fun, hi".split(",")

The words in list A constitute my word vector basis. How can I calculate the cosine similarity scores of each word in B?

What I've done so far: I can calculate the cosine similarity of two whole lists with the following function:

def counter_cosine_similarity(c1, c2):
    terms = set(c1).union(c2)
    dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
    magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
    magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
    return dotprod / (magA * magB)

But how do I have to integrate my vector basis and how can I calculate then the similarities between the terms in B?

What do you mean by "calculate the cosine similarity scores of each word in B"? As you see in the parameters for `counter_cosine_similarity`, that similariy relates to two vectors, so I assume you want this between two words. So do you want the similarity for each pair of words, one from `A` and one from `B`? — Rory Daulton, Nov 05 '16 at 12:00

score 3 · Accepted Answer · answered Nov 05 '16 at 12:13

import math
from collections import Counter

ListA = "Hello how are you? The weather is fine. I'd like to go for a walk.".split()
ListB = "bank, weather, sun, moon, fun, hi".split(",")

def cosdis(v1, v2):
    common = v1[1].intersection(v2[1])
    return sum(v1[0][ch] * v2[0][ch] for ch in common) / v1[2] / v2[2]

def word2vec(word):
    cw = Counter(word)
    sw = set(cw)
    lw = math.sqrt(sum(c * c for c in cw.values()))
    return cw, sw, lw

def removePunctuations(str_input):
    ret = []
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    for char in str_input:
        if char not in punctuations:
            ret.append(char)

    return "".join(ret)


for i in ListA:
    for j in ListB:
       print(cosdis(word2vec(removePunctuations(i)), word2vec(removePunctuations(j))))

Calculate cosine similarity between words

1 Answers1