0

I am using VW to try to predict multi classes. The strangest part is that it doesn't matter which parameters I use, the result is always the same.

Should that happen, maybe because of my data?

Details:

Around 90k lines of data. A line of the data:

1 2334225|SUBDEPT "D1SUB1" "D2SUB1" |DEPT "DEPT1" "DEPT2" |SCANCODE "11223442" "65434533543" |WDAY Friday |AMTBOUGHT 2 

Its a multiclass problem,so the command line is:

vw --ect 38 ../Processed/train.vw.txt --loss_function logistic --link=logistic

The single parameter that changes something is from --ect to --oaa. I have tried adding the following, but none changes the final validation values:

  • -c -k --passes 20 (goes until 8)
  • --l1 or --l2
  • --power_t
  • --ignore D or --ignore d (or s or su...)

the results are always

average loss = 0.911153 h

Is there something that I am missing here?

Adriano Almeida
  • 5,186
  • 5
  • 20
  • 28
  • Can you share your data? Difficult to say otherwise. However, as a general rule, one shouldn't be forcing a loss (and/or link) function on top of reductions. `--ect 38` is a multiclass (rather than binary-classification) problem. So assuming the data has 38 different labels in the range `[1, 38]`, just let the `--ect` algorithm (as with other typical reductions) pick the preferred loss function (or leave the default alone). HTH. – arielf Nov 28 '15 at 03:37
  • If I don't link anything, it gives me a loss of .34 (using the flag --loss_function logistic). If I do not use the flag, it gives me only the prediction, and I need the probabilities. Even though, removing the link and then changing parameters wont't change the loss as well. – Adriano Almeida Nov 28 '15 at 04:13
  • Have you randomly shuffled the 90k lines of training data? – Martin Popel Nov 28 '15 at 20:07
  • @arielf : "shouldn't be forcing a loss (and/or link) function on top of reductions" That's not true in general. The default loss is "squared", which is not suitable for getting probabilities with oaa. – Martin Popel Nov 28 '15 at 20:08
  • @AdrianoAlmeida : If you want probabilities, use the newest VW version and `vw --oaa=38 --loss_function=logistic --probabilities` (with `--probabilities`, you don't need `--link` and VW will report both 0/1 loss and multi-class logistic loss). With `--ect` you cannot predict probabilities. – Martin Popel Nov 28 '15 at 20:13
  • @MartinPopel they have the system order, but no class order. I will check on the --probabilities flag. Thanks – Adriano Almeida Nov 28 '15 at 20:17
  • Few more unrelated comments on VW usage. * Feature names do not need double quoting (quote char has no special meaning, whitespace and colon is not allowed in feature names anyway). * Namespaces SUBDEPT and SCANCODE start with the same letter, so they cannot be distinguished with `--ignore`, `-q`, `--cubic` etc. * Namespaces are case sensitive (so `--ignore d` means smething different than `--ignore D`). – Martin Popel Nov 28 '15 at 20:19
  • Tried --probabilities on VW 8.0. It says the option is unrecognised. What version are u using? – Adriano Almeida Nov 29 '15 at 22:04
  • Did you try to manually delete cache file before training when using `--passes` option? – re-gor Nov 14 '17 at 21:56

0 Answers0