2

I am trying to implement and reproduce the results of federated Bert pertaining in paper Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.

I prefer to use TensorFlow code of Bert pretraining.

For training in a federated way, initially, I had divided dataset into 3 different silos(each of that contains discharge summary of 50 patients, using mimic-3 data). and then pretrained the Bert model for each dataset using TensorFlow implementation of Bert pretraining from the official release of Bert.

Now I have three different models that are pretrained from a different dataset. for model aggregation, I need to take an average of all three models. since the number of notes in each silo is equal, for averaging I need to do sum all models and divide by three. How to take avg of models as did in the paper? somebody, please give me some insights to code this correctly. The idea of averaging the model weight is taken from the paper FEDERATED LEARNING: STRATEGIES FOR IMPROVING COMMUNICATION EFFICIENCY .

I am very new to deep learning and TensorFlow . so someone please help me to figure out the issue and suggest some reading material for TensorFlow .

In the paper, it is mentioned that It is a good option to overcome privacy and regulatory issues while sharing of clinical data. My question is

is it possible to get sensitive data from this model.ckpt files? Then how?

Any help would be appreciated. Thanks...

  • 1
    If the question is only about taking the average of the **N** times saved model, then there is a possible solution answered already. Duplicate. https://stackoverflow.com/questions/48212110/average-weights-in-keras-models – Innat Apr 19 '21 at 10:49
  • 1
    Here is some additional info that might help you: [tfa.callbacks.AverageModelCheckpoint](https://www.tensorflow.org/addons/api_docs/python/tfa/callbacks/AverageModelCheckpoint) – Innat Apr 19 '21 at 10:50

1 Answers1

1

Model averaging can be done in many ways. The simplest is to have a complete copy of each architecture in each silo, and take a (weighted) average of their parameter scores, and use this as the parameters for the full model. However there are a number of practical issues (latency, network speed, computational power of device) which may prohibit this, and so more complex solutions where silos are only trained on subsets of variables etc are used (as in the paper you cite).

It is not generally possible to retrieve information (sensitive of otherwise) from a dataset purely from the parameter updates to a model fine-tuned on it.

iacob
  • 20,084
  • 6
  • 92
  • 119