I'm trying to implement a similarity function using
- N-Grams
- TF-IDF
- Cosine Similaity
Concept:
words = [...]
word = '...'
similarity = predict(words,word)
def predict(words,word):
words_ngrams = create_ngrams(words,range=(2,4))
word_ngrams = create_ngrams(word,range=(2,4))
words_tokenizer = tfidf_tokenizer(words_ngrams)
word_vec = words_tokenizer.transform(word)
return cosine_similarity(word_ved,words_tokenizer)
I searched the web for a simple and safe implementation but I couldn't find one that was using known python packages as sklearn, nltk, scipy etc.
most of them using "self made" calculations.
I'm trying to avoid coding every step by hand, and I'm guessing there is an easy fix for all of 'that pipeline'.
any help(and code) would be appreciated. tnx :)