I have thousands of vectors of about 20 features each.
Given one query vector, and a set of potential matches, I would like to be able to select the best N
matches.
I have spent a couple of days trying out regression (using SVM), training my model with a data set I have created myself : each vector is the concatenation of the query vector and a result vector, and I give a score (subjectively evaluated) between 0 and 1, 0 for perfect match, 1 for worst match.
I haven't had great results, and I believe one reason could be that it is very hard to subjectively assign these scores. What would be easier on the other hand is to subjectively rank results (score
being an unknown function):
score(query, resultA) > score(query, resultB) > score(query, resultC)
So I believe this is more a problem of Learning to rank and I have found various links for Python:
- http://fa.bianp.net/blog/2012/learning-to-rank-with-scikit-learn-the-pairwise-transform/
- https://gist.github.com/agramfort/2071994 ...
but I haven't been able to understand how it works really. I am really confused with all the terminology, pairwise ranking, etc ... (note that I know nothing about machine learning hence my feeling of being a bit lost), etc ... so I don't understand how to apply this to my problem.
Could someone please help me clarify things, point me to the exact category of problem I am trying to solve, and even better how I could implement this in Python (scikit-learn) ?