2

I'm currently researching with TFF and image classification (Federated Learning for Image Classification) emnist.

I'm looking at hyper parameters for the model learning rate and optimizer. Is grid search a good approach here ? . In a real world scenario would you simply sample clients/devices from the overall domain and if so if I was to do a grid search would I have to fix my client samples 1st. In which case does it make sense to do the grid search.

What would be a typical real world way of selecting parameters, ie is this more a heuristic approach. ?

Colin . . .

Eduardo Yáñez Parareda
  • 9,126
  • 4
  • 37
  • 50

1 Answers1

2

I think there is still a lot of open research in these areas for Federated Learning.

Page 6 of https://arxiv.org/abs/1912.04977 describes a cross-device and a cross-silo setting for federated learning.

In cross-device settings, the population is generally very large (hundreds of thousands or millions) and participants are generally only seen once during the entire training process. In this setting, https://arxiv.org/abs/2003.00295 demonstrates that hyper-parameters such as client learning rate play an outsized role in determining speed of model convergence and final model accuracy. To demonstrate that finding, we first performed a large coarse grid search to identify promising hyper-parameter space, and then ran finer grids in the promising regions. However this can be expensive depending on the compute resources available for simulation, the training process must be run to completion to understand these effects.

It might be possible to view federated learning as very large mini-batch SGD. In fact the FedSGD algorithm in https://arxiv.org/abs/1602.05629 is exactly this. In this regime, re-using theory from centralized model training may be fruitful.

Finally https://arxiv.org/abs/1902.01046 describes a system used at Google for federated learning, and does have a small discussion on hyper-parameter exploration.

Zachary Garrett
  • 2,911
  • 15
  • 23