How do I manage multiple training sets using the Watson NLC Toolkit

Question

From what I see, there's no way to upload multiple training sets to the new Watson NLC tooling. I need to manage separate training sets and their associated classifiers. What am I missing here?

John Bufe · Accepted Answer · 2016-03-02T14:34:17.987

Preferred option: Provision an NLC service instance for each set of training data you'd like to work with and separately access the tooling for each.

Workaround: Currently, the flow for managing multiple training sets in one NLC service instance is as follows:

(Optional to start fresh) Go to the training data page and click on the garbage icon to delete all training data.
Upload a training set on the training data page using the upload icon.
Manipulate the data as necessary. Add texts and classes, tag texts with classes, etc.
Create a classifier. When you create a classifier, it is essentially a snapshot of your current training data since you are able to retrieve it later from the classifiers page.

Repeat steps 1-4 as necessary until you have uploaded all of your training data sets and created the corresponding classifiers.

When you want to continue working on a previous training set:

Clear your training data (step 1 from above).
Go to the classifiers page.
Click on the download icon for the classifier which contains the training data you'd like to work with.
Return to the training data page and upload the file downloaded from step 3.

Not ideal but we can do this for now. Ideally the Watson NLC tooling team tracks this as a feature request since this is a beta release of the tooling. — Biosopher, Feb 27 '16 at 22:18

James Taylor · Answer 2 · 2016-03-01T21:03:54.230

1

The best way to manage multiple training sets is to use a different NLC service instance for each training set.

The current beta NLC tooling is not intended to manage separate training sets within a single service instance. For example, the tool makes suggestions when you add texts without classes- these are based on the most recently trained classifier which won't make sense if that was based on a completely different training set.

The work around suggested by @John Bufe will work if you have a hard limit on the number of NLC services you can use for some reason, e.g. you have reached your limit of Bluemix services. Cost is not a factor here as additional NLC service instances will not increase the overall price since the monthly charge is for trained classifier instances. For example, if you have four service instances with a single classifier in each, you'll see 3 charged and 1 free.

If you want to use the NLC beta tooling to manage your training data, I would recommend using separate NLC services for each training set you require.

edited Mar 01 '16 at 21:03

answered Feb 24 '16 at 08:25

James Taylor

785
5
19

Given the ability to have 8 classifiers/NLC service, it seems a waste to spin up multiple services at $20/month just to use the NLC tooling. It's easy enough to associate training sets to classifiers on the tooling side (to support auto-suggest as you point out), so this single-threaded tooling seems more a limitation than a feature. I realize this is a beta release of the tooling so hopefully the Watson team tracks this feature request. Along with better support for bulk editing (e.g. renaming intents). – Biosopher Feb 27 '16 at 22:17
@John and I both work in the Watson tooling team so it's great to be getting feedback on the beta already. Don't forget that classifiers cannot be retrained, so you're likely to need several per training set as you train, test and improve classifiers. Having said that, while there's a limit of one training set in the tooling, I must admit I've been using spreadsheets to manage training data and just use the beta tooling to upload the data and train classifiers. – James Taylor Feb 29 '16 at 08:56
@Biosopher apparently the $20/month charge may actually be per trained classifier instance, not per classifier service, which would hopefully make the work around unnecessary, i.e. just use services to manage separate training sets. I'll update my answer if I can confirm this. – James Taylor Feb 29 '16 at 17:00
I've updated the answer with an example of the pricing and asked for the Bluemix documentation to be improved. Guess I should get in the habit of deleting trained classifier instances when I've finished with them! – James Taylor Mar 01 '16 at 21:09

How do I manage multiple training sets using the Watson NLC Toolkit

2 Answers2