Watson Retrieve and Rank : Can’t train the ranker in java

Question

I have followed the tutorial available on IBM website (https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/retrieve-rank/get_start.shtml) and I’m now trying to apply the same procedure in Java, but I encounter some troubles when I start training the ranker.

I used the data provided in the tutorial (cranfield dataset), but the ranker keep training and, about 20 min after the beginning, the ranker status change to "failed".

I’m guessing that I miss something because it works perfectly with curl, but I can't figure out what is wrong with my code.

public static void main(String[] args) {    
    String cluster_id = "", config_name = "config0", collection_name = "ibm_collection", ranker_id = "", ranker_name = "ranker0";
    String username = "<RAR_login>", password = "<RAR_password>";
    HttpSolrClient solrClient = null;

    //The RetrieveAndRankService class contains a RetrieveAndRank instance; it also contains some method to store results in ElasticSearch
    RetrieveAndRankService rars = new RetrieveAndRankService(username, password);
    rars.deleteAllCluster(rars.getService());


    //Create cluster
    try {
        SolrCluster cluster = rars.createSolrCluster("Cluster0", 0);
        cluster_id = cluster.getId();
        solrClient = rars.getSolrClient(rars.getService().getSolrUrl(cluster_id), username, password, cluster_id);

    }catch(Exception e) { e.printStackTrace(); }

    //Upload configuration
    rars.uploadSolrConfig(cluster_id, config_name, CRANFIELD_CONFIG);

    //Create configuration
    try {
        rars.createCollection(collection_name, config_name, solrClient);
    }catch(Exception e) {e.printStackTrace(); }

    //Indexing documents
    try {
        addJsonDocuments(solrClient, CRANFIELD_DATA, collection_name);
    }catch(Exception e) { e.printStackTrace(); }

    //Create and train ranker
    try {
        Ranker ranker = rars.getService().createRanker(ranker_name, new File(CRANFIELD_GT)).execute();
        ranker_id = ranker.getId();
        while (ranker.getStatus() == com.ibm.watson.developer_cloud.retrieve_and_rank.v1.model.Ranker.Status.TRAINING) {
            Thread.sleep(4000); // sleep 4 seconds
            ranker = rars.getService().getRankerStatus(ranker.getId()).execute();
            System.out.println(ranker.getStatusDescription());
            System.out.println("Training Ranker...");
        }
        System.out.println(ranker.getStatusDescription());
        rars.cleanupResources(solrClient, cluster_id, config_name, collection_name);
        rars.deleteAllCluster(rars.getService());
    }catch(Exception e) { e.printStackTrace(); }
}

The result is the following :

Creating cluster...
Creating cluster...
[...]
The following cluster have been created : {
"solr_cluster_id": "scb4bbcd66_5aa1_4862_9c8d_b1572846102c",
"cluster_name": "Cluster0",
"cluster_size": "",
"solr_cluster_status": "READY"
}
Uploading configuration...
Uploaded configuration !
Creating collection...
Collection created.
Adding documents done. Response Text is : {"responseHeader
{"status":0,"QTime":1655}}
"Training Ranker..."
"Training Ranker..."
"Training Ranker..."
[...]

All suggestions are welcomed, thanks for your time.

Take a look at this example https://github.com/watson-developer-cloud/java-sdk/tree/master/examples/retrieve-and-rank-solrj It has everything you need. I think that in your example you are not actually calling `execute()` in some of the methods. — German Attanasio, Jun 26 '16 at 04:38
Thanks for you answer. Actually I have used this example as a basis and, after a recheck, there was no `execute()` missing. However, I think I may have found the problem this morning : the cranfield_gt.csv available in the tutorial is formatted to be used with the training.py script, but i was trying to use it directly in my program. I suppose that, if I don't want to use the Python script, I must either create a method that can do the training.py job, or create my own training .csv file, right ? — Kasparrow, Jun 27 '16 at 09:31
You are right @kasparrow, you need to use the `training.py` script. If you ended up doing it in Java please open a pull request to the `java-sdk` and I will be happy to merge it — German Attanasio, Jun 28 '16 at 17:09

Watson Retrieve and Rank : Can’t train the ranker in java

0 Answers0