0

I am attempting to load a frozen graph created through TensorFlow Keras (.pb) onto a memory limited microcontroller. Via Hyperopt(...) I am optimizing my training hyperparameters, however I would like to include the size of the resulting model as part of the search space. At the moment, I'm including using the weighted loss as an input to hyperopt, something like the following:

def optimizer(args):
...
  training_history = nn.train_v00(....)
  file_bytes = int(os.path.getsize(frozen_graph_filepath)) # returns size in bytes

  final_loss = training_history.history['loss'][-1]
  weighted_loss = final_loss*(file_bytes/(100*1024)) # Want model smaller than 100KB

  if OPTIMIZATION_TARGET == 'loss':
    return_struct = {
      'status': STATUS_OK,
      'loss': weighted_loss,
      'epochs': epochs,
      'metrics': {
        'accuracy': final_acc
       }
    }

    return return_struct

space = {
  'learning_rate': hp.loguniform('learning_rate', np.log(0.00001), np.log(0.05)),
  'dropout': hp.uniform('dropout', 0, 0.5),
  'batch_size': hp.quniform('batch_size', 32, 128, 2),
  'input_seq_len': hp.quniform('seq_len', 32, N_ADC_SAMPLES, 2),
  'cnn_n_filters': hp.quniform('cnn_n_filters', 1, 10, 1),
  'cnn_kernel_size': hp.quniform('cnn_kernel_size', 1, 10, 1),
  'cnn_pool_size': hp.quniform('cnn_pool_size', 1, 10, 1)
}
t = Trials()
best = fmin(optimizer, space, algo=tpe.suggest, max_evals=MAX_EVALS, trials=t)

From what I have found thus far, there isn't a way to directly backpropagate the size of the model back through the training, but is there a better way of doing this?

Thanks for the consideration!

u12357
  • 1

1 Answers1

0

If I understand you correctly this is a question about how to craft your loss function, not about hyperopt directly. Correct? If so, if you indeed have a hard cut-off at 100kb, I wouldn't scale your training loss (as it might lead to strange artifacts like a poor performing, but very tiny model taking the first spot).

How about you report the regular training loss if the model size is smaller than 100kb and otherwise return an astronomically large number? Just an idea.

  • Re: Correct? Sorry for the confusion. The intention of my question is to find a way to optimize Hyperopt's bayesian search while incorporating the model size, because (as I understand it) the model size cannot be included in the loss function itself (due to the inability to incorporate the derivative of the resulting model size into the model backprop). – u12357 Nov 23 '19 at 23:19
  • Thanks for the idea. I think one could implement your suggestion in Hyperopt's fmin by indicating the that the 'status' value in the key-value returned from fmin(...) is STATUS_FAIL when the size is larger than 100KB. However, I think this would result in Hyperopt doing an inefficient random/grid search until Hyperopt selects hyperparameters for multiple models below this model size threshold. Then, assuming that the lower loss results from a larger model (not necessarily the case), I'd expect the model size to asymptote to the 100KB threshold, rather than continuing down. – u12357 Nov 23 '19 at 23:20
  • right, you can't very well make the model size an input to your model (technically you can, but that won't help you much, I think). well, in this scenario it's likely that the model size will converge to the threshold, as it's so low. your models will have fairly low capacity and your loss function will reflect that. why would you want it to "continue down"? my understanding is that you want the best model possible under size constraints, not the smallest model possible (that somehow still magically performs). Not sure I understand your "inefficient random search" argument. give it a shot! – Max Pumperla Nov 25 '19 at 07:30