Let's say we have a list of 50 sentences and we have an input sentence. How can i choose the closest sentence to the input sentence from the list?
I have tried many methods/algorithms such as averaging word2vec vector representations of each token of the sentence and then cosine similarity of result vectors.
For example I want the algorithm to give a high similarity score between "what is the definition of book?" and "please define book".
I am looking for a method (probably a combinations of methods) which 1. looks for semantics 2. looks for syntax 3. gives different weights for different tokens with different role (e.g. in the first example 'what' and 'is' should get lower weights)
I know this might be a bit general but any suggestion is appreciated.
Thanks,
Amir