0

The autoML stops on a clock. I compared two auto-ML's where one used a subset of what the other had to make the same predictions, and at 3600 seconds runtime the fuller model looked better. I repeated this with a 5000 second re-run, and the subset model looked better. They traded places, and that isn't supposed to happen.

I think it is convergence. Is there any way to track convergence-history of stacked ensemble learners to determine either if they are relatively stable? We have that for parallel and series CART ensembles. I don't see why a heterogeneous ensemble wouldn't do the same.

I have plenty of data, and especially with cross-validation, I would like to not think that the difference was because of the training vs. validation set random draws.

I'm running on relatively high performance hardware so I don't think it is a "too short runtime". My "all" model count is between hundreds and a thousand, for what it's worth.

EngrStudent
  • 1,924
  • 31
  • 46
  • what additional models were produced during the 5000 second run, that did not appear during the 3600s run? Also did you set a seed, and use the same seed for the two different runs? – Lauren Feb 16 '18 at 17:33
  • @Lauren - details I don't always share. I went down from ~28k columns to ~500 in pre-processing. The were two families that made the columns - one is naked time cosines/sines (a dozen-ish columns) and the other ~480 I can't detail in a public forum. The autoML is trying to fit the same y column. I did not use the same seed. I want a general enough model that I could re-run the approach 5 times, and get consistent comparative results. The only difference in 5000 seconds and 3600 seconds, is run-time. I re-re-built it and gave each 6 hours to converge. – EngrStudent Feb 17 '18 at 17:22
  • Are you running these as separate projects (with separate leaderboards) or are you adding more models to an existing AutoML project. If the latter, did you make sure not to use the same seed twice? – Erin LeDell Feb 17 '18 at 21:54
  • @Lauren - I drive it from R. I am not doing sequential augmentation. As far as I know, I am running them as separate fitting processes with their own leaderboards. As long as the stacked ensemble doesn't grab learners it didn't create, then we are good. I would expect that if it did random aggregation it could violate the separation between training and testing. When I look to compare two relatively trivial models using gradient descent, lets say a linear and a quadratic, the only things they have in common is the training data, the descent method, and the measure of goodness to compare. – EngrStudent Feb 17 '18 at 23:10
  • 1
    you can guarantee different fitting processes if you use a different `project_name` for each run (more details on how project_name works and why using the same training set sequentially (in the same h2o cluster instance) can lead to building off the same leaderboard http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) – Lauren Feb 19 '18 at 16:46

0 Answers0