1

I use human activity recognition (HAR) dataset with 6 classes using federated learning (FL). In this case, I implement the non-IID dataset by assigning (1) each class dataset to different 6 workers, (2) two classes to 3 different workers, and (3) three classes to 2 different workers.

When I run the FL process, the validation accuracy for scenario (3) > (2) > (1). I expect that all scenarios will obtain almost the same validation accuracy. For each scenario, I use the same hyperparameter settings including batch size, shuffle buffer, and the model configuration.

Is it common in FL with the non-IID dataset or is there any problem with my result?

tfreak
  • 123
  • 9

1 Answers1

1

The scenario where each worker has only one (and all of) one label can be considered the "pathologically bad" non-IID for Federated Averaging.

In this scenario, its possible that each worker learns to predict only the label it has. The model does not need to discriminate on any features: if a worker only has class 1, it can predict class 1 and obtain 100% accuracy. Each round, when all of the model updates are averaged, the global is back to a model that only predicts each class with 1/6 probability.

The closer each workers distribution of examples is to the global distribution (or each other, i.e. the more IID the client datasets are), the closer its local training will produce an update to the global model that is in the same direction as the averaged update, leading to better training results.

Zachary Garrett
  • 2,911
  • 15
  • 23
  • When I use the IID dataset, there is no such issue. However, usually, when we use FL, we also need to make the dataset non-IID as in a practical scenario. Based on McMahan and some FL papers, we can use two classes for each worker to make the dataset pathologically non- IID. Is that correct? And how to get good accuracy with a non-IID dataset with many workers then? – tfreak Feb 20 '21 at 09:45
  • I suspect additional hyperparameter tuning could help. https://arxiv.org/abs/1602.05629 ran experiments that altered the batch size and number of epochs of local client training, showing dramatic differs in convergence. https://arxiv.org/abs/2007.00878 also discusses the importance client learning rates. – Zachary Garrett Feb 20 '21 at 14:16
  • Okay. I will try again to tune the aforementioned hyperparameter settings. However, is there any possibility that the way I implement the non-IID dataset is wrong? Actually, I just use the training samples with the same class at each worker and shuffle them when the FL is running. Should I also shuffle the testing dataset? – tfreak Feb 21 '21 at 10:41