pairwise similarity giving bad output

Question

I have generated the below code in python 3 to look at the similarities between the two blocks of text. One is a list of adjectives and the other is a sentence.

The aim is to see if the adjectives (or hero archetype) are representative of the block of text.

However, it is generating an output of 0. i.e. no similarity. When they two are obviously quite similar. Has something been defined badly?

import nltk
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

# Define the hero archetype
hero_archetype = "brave selfless determined sacrifice"

# Obtain the block of text
text = "The hero bravely sacrificed himself for the greater good."

# Tokenize the hero archetype and the text
hero_tokens = nltk.word_tokenize(hero_archetype)
text_tokens = nltk.word_tokenize(text)

print (hero_tokens)
print (text_tokens)

# Convert the tokens back into text
hero_text = " ".join(hero_tokens)
text_text = " ".join(text_tokens)

# Compute the pairwise similarities
vectorizer = CountVectorizer().fit_transform([hero_text, text_text])

print (vectorizer)
similarity_matrix = cosine_similarity(vectorizer)

print (similarity_matrix)

# Print the pairwise similarities
print("Similarity between hero archetype and text:", similarity_matrix[0][1])

pairwise similarity giving bad output

0 Answers0