I'm trying to use SMAC v3 for hyperparameter optimization.
I want to limit the optimization process with a fixed amount of the target function (tae_runner) computations, and run it in "mini-batch" mode:
First, I run SMAC with some budget, then I add some more and make it continue from the point where it stopped.
How could I determine the reasonable size for this computational budget?
Let me explain my concerns below:
My target function is considerably expensive, it can take 1-10 seconds to be calculated or even more. Thus, I decided to use mini-batches ad perform some additional steps, which are specific to my problem.
budget = .. # 1? 10? 100? 1000? 10000?
total_limit = budget
my_scenario = Scenario({"run_obj": "quality",
"cs": my_configuratoin_space
"runcount_limit": total_limit
})
smac = SMAC(scenario=my_scenario, rng=42, tae_runner=my_target_function)
best_configuration = self.smac.optimize()
# when I decide to continue running:
total_limit += budget
my_scenario = Scenario({"run_obj": "quality",
"cs": my_configuratoin_space,
"runcount-limit": total_limit
})
smac.stats._Stats__scenario = my_scenario
better_configuration = smac.optimize()
This code seems to work. Here is what I found in docs:
wallclock_limit, runcount_limit and tuner-timeout are used to control maximum wallclock-time, number of algorithm calls and cpu-time used for optimization respectively.
As far as I understood from the code in repository, this code works briefly as follows:
1) SMAC wraps SMBO and passes the Scenario and other parameters to it.
2) There is main SMBO loop, which is constantly generating new challengers (it happens to be 10K of them, including interleaving randoms):
challengers = self.choose_next(X, Y)
and comparing them to the incumbent (best found config so far):
self.incumbent, inc_perf = self.intensifier.intensify(
challengers=challengers,
...
time_bound=max(self.intensifier._min_time, time_left))
What is the time_bound here, if I have only runcount_limit set, btw?
3) If the budget is exceeded, the main smbo loop exits:
if self.stats.is_budget_exhausted():
break
And the rest of challengers, which where generated, cost-predicted and sorted, are just dropped.
My concern is the following:
If runcount_limit value is too small, for example 1
or 10
or something like that, it will probably be the huge waste of resources for generating, sorting and throwing away lots of configurations (5K) along with overhead of starting and stopping smac for every mini-batch.
On the other hand, if I set runcount_limit to be a multiple of 10K, it would not be a mini-batch any more.
Can you suggest a way to determine substantiate the size for those batches?