I want to implement a system which cache the most popular queries and given a new query try to find a similar query in the cache and return the same result. Since I want to do it as general as possible (queries can be short texts, images or even audio tracks) I'm going down with the Approximate Nearest Neighbor (ANN) approach, which is based on representing the query in a vector space.
My question is: what is the most efficient way to represent a query as a vector (which will be used as input in ANN)?