I use human activity recognition (HAR) dataset with 6 classes using federated learning (FL). In this case, I implement the non-IID dataset by assigning (1) each class dataset to different 6 workers, (2) two classes to 3 different workers, and (3) three classes to 2 different workers.
When I run the FL process, the validation accuracy for scenario (3) > (2) > (1). I expect that all scenarios will obtain almost the same validation accuracy. For each scenario, I use the same hyperparameter settings including batch size, shuffle buffer, and the model configuration.
Is it common in FL with the non-IID dataset or is there any problem with my result?