I am planning to use Opacus to implement differential privacy in my federated learning model but I have a very basic doubt that I would love to have cleared before that.
So as far as my understanding goes, using Opacus, we use an optimizer like DPSGD that adds differential noise to each batch of each client’s dataset while they are in “local training”. And in federated learning, we train client models for a few “local epochs” before sending their weights out to a central server for aggregation, and we add differential noise before sending out the model weights.
So my question is, why do we use DPSGD to add noise to every single batch of every single client dataset during local training when we could just add noise to the local weights before they are sent out? Why do we not let the local training epochs happen as is and simply add noise to the outbound weights at the time of departure? What am I missing?