1

I have i project where the features are 0 or 1 (it means YES or NO) and labels are from 0 to 9. The application will ask 100 questions to the user and answers will be 0 or 1 (the features). From those question I will tell him which label is appropriate to him (if 0 or 1 or 2.....9)

I already do some codes (with LR) What do you think? For this situation I will use multiclass logistic regression or multiclass decision tree

franiis
  • 1,378
  • 1
  • 18
  • 33
betty bth
  • 33
  • 7
  • I don't quite understand what are you asking about. If you are asking about which technology to choose it looks like to broad for SO. Could you be more precise with your question? – franiis Mar 09 '18 at 12:38
  • yes, for this situation do i have to choose a model with "decision tree" or "logistic regression" ? it's a machine learning application the features are "yes" or "no" and labels are "0", "1, ----> , "9" – betty bth Mar 09 '18 at 12:43
  • why not try both approaches and choose the best one? – MaxU - stand with Ukraine Mar 09 '18 at 12:45
  • i already try it with LR but I wanna to try it with decision tree, the probleme that i didnt find an example of "multiclass decision tree" to try it, – betty bth Mar 09 '18 at 12:49
  • @ibtissamboutahi Multiclass for trees isn't trivial. You can use superset (but I doubt you will have enough data to do it properly). You could create N tress and each tree will model one of labels – franiis Mar 09 '18 at 13:06
  • @franiis thank you, but do you have an example (code, link...) to do that, Thank you – betty bth Mar 09 '18 at 13:08
  • @ibtissamboutahi I suggest you to create simple single class DT (one tree for each label). I guess you can find it somewhere on SO or Google. I don't know if it will work better than LR (but comparing results is part of datascientist job) – franiis Mar 09 '18 at 13:11

1 Answers1

0

logistic regression works well when the dimensionality of the data is high. Whereas in decision tress the depth should not be too deep. So the better of would be decision tree in your case but the good thing to be is to try both the approaches and then see their performance through any of metrics like accuracy, AUC, log loss etc.

Aditya
  • 950
  • 8
  • 37