3

Is there a way to estimate the remaining time when fitting a model? For example

model = sk.ensemble.RandomForestRegressor(n_estimators=10)
model.fit(x, y)

I have a quite large dataset (millions of rows), this is going to take some time so I would like to know estimated time so I can do other thigngs and get back when the process is finished.

With ensembles like random forest estimation of remaining time should be [reasonably] easy.

mikkom
  • 3,521
  • 5
  • 25
  • 39

1 Answers1

7

Try verbose option. You can change it from 0 (no output), 1 (update for each job), and 2 (update for each tree), e.g.

model = RandomForestRegressor(n_estimators=100, verbose=2, n_jobs=2).fit(X_train, y_train)
ysakamoto
  • 2,512
  • 1
  • 16
  • 22
  • Thank you! This is almost exactly what I was looking for! Too bad the time estimates are printed only after each job has finished. Is there a way to print estimates after each tree has been built? – mikkom Mar 04 '14 at 07:43
  • As far as I know, I don't think it can give you time estimates because everyone's machine has different speed. But if you know how long it takes to create one tree, you can get good estimates of how long to create all the trees. You can directly ask in their mailing list or propose a new feature in their GitHub page https://github.com/scikit-learn/scikit-learn. – ysakamoto Mar 04 '14 at 08:29
  • Thanks, I think I'll do that. I created similar time estimation myself for Weka (calculating the avg tree creation time and projecting it to the uncreated trees) and as multithreading library already prints time estimates at the end it should be very easy. – mikkom Mar 04 '14 at 10:55
  • stucks at `building tree 8 of 100` then long time no update. – huang Nov 04 '20 at 16:51