For its approximate nearest neighbor (ANN) search using HNSW (Hierarchical Navigable Small Worlds), Elasticsearch performs document similarity by comparing documents represented in vector form. How are these vectors created? I am familiar with word embeddings for individual words (ala Word2Vec). I am also familiar with bag-of-words (BOW) representations. Are these vectors directly created from some amalgam of word embeddings, such as a predefined set of keywords? Any pointer to where in their documentation this process is described would be helpful.
Asked
Active
Viewed 15 times