The IBM Watson™ Retrieve and Rank service relevancy score and featureVector calculation

Question

I have followed the documentation and created the training data for my catalog.

https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/retrieve-rank/training_data.shtml

In my training data, for the records that are not relevant to my answers, I have given them a value of ZERO. Per the document, a relevance label of "0" is predefined as indicating that an answer is not relevant.

The training data was successfully completed and I have the ranker_id. Now when I run the query using the fcselect and the ranker_id, I find that the top-most result on my query is the one which I had marked as '0' to mean non-relevant.

The document shows up to have the high score of 10, as follows:

<float name="score">10.0</float> 
<str name="featureVector">0.11107889 0.046247214 0.0 0.046247214 0.0 0.0 0.0 0.0 0.096357614 0.04101021 0.0 0.04101021 0.0 0.0 0.0 0.0 0.6666667 0 0.6931471805599453 10.0</str>

I am looking for insight on understanding this score versus the relevance we provide in the training data. How do I improve the training data / relevance such that I see expected results.

There isn't much merit in trying to read and understand the feature vector yourself since the values relate to internal features generated automatically by the service that aren't designed to be particularly human readable. How big is your document corpus and how many training examples are there in your ground truth file? It is possible you haven't given the systems enough data for it to learn from. — James Ravenscroft, Mar 11 '16 at 21:18
@JamesRavenscroft - The corpus has about 40 items and the ground truth file has 50 rows of training data. The ranker was successful only after it met the minimum training data requirement. — Shweta Gupta, Mar 17 '16 at 07:35
However, to your point, I do not know if it considers this enough data to learn from. — Shweta Gupta, Mar 17 '16 at 09:29
There is a functional minimum which the system validates for but you should almost always aim as high as possible - like a human entering an exam it is better to have a comprehensive knowledge of the subject than a very basic one. Also make sure that your question is representative of the answer and uses similar words and phrases. You can read about [using synonyms](https://examples.javacodegeeks.com/enterprise-java/apache-solr/apache-solr-synonyms-example/) to improve your retrieve matches which in turns facilitated the Watson ranker — James Ravenscroft, Mar 17 '16 at 09:39

The IBM Watson™ Retrieve and Rank service relevancy score and featureVector calculation

0 Answers0