2

I want do add more examples to my classifier by java sdk. The idea is periodically increase the data size and improving the classifier. However, the docs only shows training option to create new classifier.

If I can't retrain, can I get the used data in the original classifier to train a new classifier by sdk?

In case of this features don't existing, what's the best approach? Increase my data locally and create new classifiers in each new train sending the whole database?

1 Answers1

2

There is no API to either update/retrain a classifier, nor to retrieve the training data of an existing classifier.

The corpora that the service generates do not have an update or reinforced learning option. Hence the need to generate a new corpus if there is a change in the training data.

Which also means that when the corpus is created there is no need for the service to keep the training data.

Summary of the discussion in comments:

If you want to get the effect of re-training a classifier, there are 2 approaches:

  • Use Watson studio to create your classifier / train classifier, after the initial train you will see an option to retrain the classifier
  • If you want to do it programmatically using the SDK, you can create and train the classifier as you would normally, and then delete the existing classifier and creating a new classifier with the new data set .

P.S: Under the hood Watson studio also deletes and creates a new classifier when you attempt to re-train

Clint
  • 6,011
  • 1
  • 21
  • 28
chughts
  • 4,210
  • 2
  • 14
  • 27
  • On the IBM portal you can either add new examples or download the existing data set. So the sdk doesn't support this operations. Thanks! – Albano Borba May 13 '20 at 20:04
  • That's right. Watson Studio has some unique features for Natural Language Classifier that are not supported by the API, and so not supported by the SDK. – Allen Dean May 14 '20 at 13:23
  • Does that mean you need to create a new classifier every time you want to train a dataset ? I have noticed that the NLC service provides 4 training events free per month, strange !! – Clint May 27 '20 at 15:30
  • Yes, @Clint. That's my current problem. :/ – Albano Borba May 28 '20 at 13:01
  • it appears that the only way to retrain is via Watson Studio, and you will also need to create the classifier using the studio. And you can classify the phrase via the SDK Programmatically and handle the output accordingly – Clint May 29 '20 at 14:44
  • Under the covers Watson Studio is not retraining. It is dropping the old classifier and creating a new one. – chughts May 29 '20 at 16:20
  • @chughts, yes you are right I was referencing the Redbook on NLC, but if you look at the pricing [plan](https://cloud.ibm.com/catalog/services/natural-language-classifier) it costs about 6x times for a New-classifier versus a training event and somehow the training events are not exposed in the SDK – Clint Jun 01 '20 at 02:46
  • I think what the pricing is saying is: You can have 1 classifier for free. If you want more then any over 1, will cost per month. If you don't want to be charged, you keep to one, by deleting and creating new. As soon as you have 2 you pay for the extra one. You get 4 training events for free on your free single classifier. ie. you can create it 4 times in a month, but if you go over the 4 events, ie you have deleted and created 4 times in a month then you need to pay anything over 4 for the month. ie you can't keep on create, delete, create. – chughts Jun 01 '20 at 08:55
  • @chughts, thank you for clarity, it would be great if you can point me to a ibm doc that substantiates the above :) – Clint Jun 02 '20 at 12:08
  • 1
    Try this: 1. Create a Classifier in Watson Studio. 2. Use the api to list all classifiers, note the ID and Name. 3. Update and Retrain the Classifier in Watson Studio. 4. Use the api to list all classifiers. You will see the same name as before, giving the illusion of the same Classifier, but the ID will be different, indicating that it is not the same Classifier. What you are really asking re pricing is ‘what is a training event.?’ – chughts Jun 02 '20 at 14:37
  • 1
    You could then use the ibm cloud activity tracker to see what events are actually triggered by the Watson Studio retraining - https://cloud.ibm.com/docs/natural-language-classifier?topic=natural-language-classifier-at_events – chughts Jun 02 '20 at 14:59
  • @chughts, have edited your post, hope you don't mind. Thanks :) – Clint Jun 24 '20 at 16:16