I am trying to understand how similarity in Spacy works. I tried using Melania Trump's speech and Michelle Obama's speech to see how similar they were.
This is my code.
import spacy
nlp = spacy.load('en_core_web_lg')
file1 = open("melania.txt").read().decode('ascii', 'ignore')
file2 = open("michelle.txt").read().decode('ascii', 'ignore')
doc1 = nlp(unicode(file1))
doc2 = nlp(unicode(file2))
print doc1.similarity(doc2)
I get the similarity score as 0.9951584208511974. This similarity score looks very high to me. Is this correct? Am I doing something wrong?