I've been seeing a lot of examples of computing euclidean distance for KNN but non for sentiment classification.
For example I have a sentence "a very close game"
How do I compute the euclidean distance for the sentence "A great game"?
I've been seeing a lot of examples of computing euclidean distance for KNN but non for sentiment classification.
For example I have a sentence "a very close game"
How do I compute the euclidean distance for the sentence "A great game"?
Think about a sentence as about a point in multi-dimensional space, only after you will defined system of coordinates you can calculate Euclidean distance. For instance. You can introduce
O2 - Alphabetical center(I just thought of it). It can be calculated as arithmetical mean of alphabetical center of each work in a sentence.
CharsIndex = Sum(Char.indexInWord) / CharsCountInWord;
CharsCode = Sum(Char.charCode) / CharsCount;
AlphWordCoordinate = [CharsIndex, CharsCode];
WordsIndex = Sum(Words.CharsIndex) / WordsCount;
WordsCode = Sum(Words.CharsCode) / WordsCount;
AlphaSentenceCoordinate = (WordsIndex ^2+WordsCode^2+WordIndexInSentence^2)^1/2;
So, the Euclidean distance can be calculated no as following:
EuclidianSentenceDistance = (WordsCount^2 + Length^2 + AlphaSentenceCoordinate^2)^1/2
No every sentence can be transformed to point in three-dimensional space, like P[Length, Words, AlphaCoordinate]. Having a distance you can compare and classify sentences.
It is not ideal approach I guess, but I wanted to show you an idea.
import math
def calc_word_alpha_center(word):
chars_index = 0;
chars_codes = 0;
for index, char in enumerate(word):
chars_index += index
chars_codes += ord(char)
chars_count = len(word)
index = chars_index / len(word)
code = chars_codes / len(word)
return (index, code)
def calc_alpha_distance(words):
word_chars_index = 0;
word_code = 0;
word_index = 0;
for index, word in enumerate(words):
point = calc_word_alpha_center(word)
word_chars_index += point[0]
word_code += point[1]
word_index += index
chars_index = word_chars_index / len(words)
code = word_code / len(words)
index = word_index / len(words)
return math.sqrt(math.pow(chars_index, 2) + math.pow(code, 2) + math.pow(index, 2))
def calc_sentence_euclidean_distance(sentence):
length = len(sentence)
words = sentence.split(" ")
words_count = len(words)
alpha_distance = calc_alpha_distance(words)
return math.sqrt(math.pow(length, 2) + math.pow(words_count, 2) + math.pow(alpha_distance, 2))
sentence1 = "a great game"
sentence2 = "A great game"
distance1 = calc_sentence_euclidean_distance(sentence1)
distance2 = calc_sentence_euclidean_distance(sentence2)
print(sentence1)
print(str(distance1))
print(sentence2)
print(str(distance2))
Console output
a great game
101.764433866
A great game
91.8477000256