0

I'm doing a data analysis task in SPSS Modeler and I have finally arrived to the point of the stream where I'm trying to fit some models on the data.

However when I tried to run the mentioned c5.0 modeling node on my data, the node generated a modeling nugget containing only a single leaf, so there are no decision rules in the model. I partitioned the data before to train and test subsets (70-30). I did not use misclassification cost, used the properly predefined attribute roles. In the model's model page I checked the use partitioned data, build model for each split, Group symbolics, Use global pruning options in, I also tried to use expert mode, but it fails on simple mode too. I have tried to use different options but it gives the same output without a single split.

How can I make the model give back a more complex decision tree, I suppose that this is not the expected outcome.

Any suggestions are welcomed.

Newl
  • 310
  • 2
  • 12

1 Answers1

0

Please, check your distribution of the target variable and share it. If the balances differs greatly from 50%-50%, you may need to balance your inputs first. Missclassification cost is another technique to give you an output, but again it should be based on your empirical distributions.

Julian
  • 154
  • 1
  • 11
  • Thank you for your answer! The distribution is 20% - 80%, I'll give misclassification cost another try, but I think I have tried it before and it didn't work. – Newl Mar 14 '19 at 14:25
  • Then you have some really strong predictor, which may be something you derived your target from. Check with a small number of inputs (factors) then increase their number by patches to detect the faulty variable. But I'd encourage you to try to balance the analysis first, be it through the balance node or playing with the a posteriori probabilities, i.e. misclassification costs. – Julian Mar 15 '19 at 06:36
  • If I uncheck the _use partitioned data_ option, I get a more appropriate model, with several nodes. When it's selected, it gives back a tree with 0 depth though. I'll try your suggestions, thanks! – Newl Mar 18 '19 at 14:48
  • I'd recomment to check how you partition your data (manually or auto), and whether you have a Split column. The latter may result in building many models for various small segments. I'll be waiting for your input. :) – Julian Mar 18 '19 at 14:56
  • I have used the auto-generated seed sequence in the train-test split node, but now as I have changed it to another randomly generated seed number, it magically has started working... – Newl Mar 18 '19 at 15:18
  • There's that little check box for "Partition by field" where you can select your target to make sure it gets a proportionate number for each partition. – Julian Mar 18 '19 at 19:32
  • Try unchecking "partition by field" and a balance node after that, so you have your classes equally distributed. That's 50-50% for a flag target. – Julian Jul 03 '19 at 10:43