Google prediction API - Training data syntax for multi classification

Question

Trying to harness the power of Google Prediction API, to classify my data. Each item in my DB can have multi categories assign to it.

For example: "My Nexus phone is rebooting constantly" could be assigned both #Android and #troubleshooting tags.

I would like to upload my training data to Google, but I'm not sure how to apply both tags to the same content. In the following example I've found the syntax that provide one category for each content like so:

"Android" ,"My Nexus phone is rebooting constantly"

What is the right syntax for multi-classification training data?

Patrice · Answer 1 · 2015-09-23T13:55:24.443

Unless I'm misunderstanding something from your question, I think the answer to it is in the docs here.

Namely, the section about text strings explains that when you submit a text string, the system actually cuts it into multiple strings, separating everything using whitespaces as a delimiter. They point out to "Godzilla vs Mothra" to be "Godzilla", "vs", and "Mothra". So in your case, you could just use "Android troubleshooting". The system will separate it in "Android" and "troubleshooting".

score 0 · Accepted Answer · answered Sep 24 '15 at 07:23

From the docs:

Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example:

"excited", "OMG! Just had a fabulous day!"

"annoying", "OMG! Just had a fabulous day!"

If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.

Google prediction API - Training data syntax for multi classification

2 Answers2