1

I want to evaluate my federated learning model using tff.learning.build_federated_evaluation. Initially, got reasonable results. but can I run the evaluation process for multiple rounds (as in the training phase done here) to get more stable results?

The evaluation code is provided below.

train, test = source.train_test_client_split(source, 2,seed=0)
test_client_ids = test.client_ids

test_data= [test.create_tf_dataset_from_all_clients().map(reshape_data)
.batch(batch_size=10)
 for c in test_client_ids]

eval_process=tff.learning.build_federated_evaluation(model_fn)

eval_process(state.model, test_data)

The evaluation output.

OrderedDict([('eval',
              OrderedDict([('sparse_categorical_accuracy', 0.53447974),
                           ('loss', 1.0230521),
                           ('num_examples', 11514),
                           ('num_batches', 1152)]))])
Eden
  • 325
  • 3
  • 13

1 Answers1

1

Running eval_process multiple rounds on the same test_data will not produce new information, and is expected to yield the same result every time. These results will be stable in the sense they don't change, but probably are not interesting.

Running eval_process on multiple rounds, using different test_data each round can be thought of as sampling a cohort of clients from the larger population to get an estimate of model quality. Computing many estimates from multiple samples can be used with statistical techniques, with more rounds leading to more stable an improved estimates of model quality.

Presumably this is the technique used in 1 and 2 which describe later aggregation services.

Zachary Garrett
  • 2,911
  • 15
  • 23
  • Right, when I run the `eval_process` for multiple rounds I got the exact same result every round. But I did not get how to use different test data each round, or how to implement the sampling of clients for larger population. (The suggested papers are interesting but is cover the concept of on-device data). Since my data consists of multiple `.csv` files and I split them into train and test clients (each file is considered a single client). How could I apply it in my case? thanks. – Eden Apr 24 '22 at 08:04