I'm hoping to explore how data augmentation work on federated learning, and I'm currently using tff to implement it. I notice that the datasets provide by tff is composed of tensors, and tensors cannot be adjusted directly, so a naive idea would be to change it to numpy arrays and then do augmentation. I tried
tfds.as_numpy(emnist_train.create_tf_dataset_for_client(n))
and it did provided me with numpy arrays, but I got problems when trying to pass it to preprocess functions. If I do:
preprocess(tfds.as_numpy(emnist_train.create_tf_dataset_for_client(n)))
where preprocess is defined as
def preprocess(dataset):
def batch_format_fn(element):
"""Flatten a batch `pixels` and return the features as an `OrderedDict`."""
return collections.OrderedDict(
x=tf.reshape(element['pixels'], [-1, 784]),
y=tf.reshape(element['label'], [-1, 1]))
return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER).batch(
BATCH_SIZE).map(batch_format_fn).prefetch(PREFETCH_BUFFER)
I would get the following error:
return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER).batch(
AttributeError: '_IterableDataset' object has no attribute 'repeat'
which seems to mean that this _IterableDataset object of numpy arrays cannot be applied for these methods.
And I tried wrapping tf.data.Dataset.from_tensor_slices
method as tf.data.Dataset.from_tensor_slices(tfds.as_numpy(emnist_train.create_tf_dataset_for_client(n)))
, but it ends up with this error:
ValueError: Attempt to convert a value (<tensorflow_datasets.core.dataset_utils._IterableDataset object at 0x00000280AA695DF0>) with an unsupported type (<class 'tensorflow_datasets.core.dataset_utils._IterableDataset'>) to a Tensor.
Is there any way to solve this problem? Or could I do augmentation just on the data it provides?
Updates
It would be enough to just use map
function if I only want to convert each sample in the dataset to a augmented one. However, if I want to add new samples to the dataset(e.g. adding samples of different labels), how can I do it? Since we can't modify the client dataset directly, I was thinking converting it to a numpy array and make further processing, yet if I do:
state, metrics = iterative_process.next(state, tfds.as_numpy(federated_train_data))
where federated_train_data
is a client dataset, I got
TypeError: Expected tensorflow.python.data.ops.dataset_ops.DatasetV2 or tensorflow.python.data.ops.dataset_ops.DatasetV1, found tensorflow_datasets.core.dataset_utils._IterableDataset.
Seems this _IterableDataset
couldn't be applied to the process. Is there a way that I can convert this dataset back to what is acceptable by tff.learning.build_federated_averaging_process()
? Or is there a better way to do this kind of augmentation?
Update 2
I was trying to use a generator from a GAN model to generate new images to augment the dataset. I have a pretrained GAN (written by tf.keras), and I wrote a dataGenerator to wrap this model for augmenting the client datasets. However, when I do the fed-avg training, the following error occurred:
File "D:\Research\GAN_AUG_FL\utils\augment_utils.py", line 53, in generate_once
generated_images = generator(generator_input)
File "D:\Research\GAN_AUG_FL\venv\lib\site-packages\tensorflow\python\keras\engine\base_layer_v1.py", line 665, in __call__
self._assert_built_as_v1()
File "D:\Research\GAN_AUG_FL\venv\lib\site-packages\tensorflow\python\keras\engine\base_layer_v1.py", line 836, in _assert_built_as_v1
raise ValueError(
ValueError: Your Layer or Model is in an invalid state. This can happen for the following cases:
1. You might be interleaving estimator/non-estimator models or interleaving models/layers made in tf.compat.v1.Graph.as_default() with models/layers created outside of it. Converting a model to an estimator (via model_to_estimator) invalidates all models/layers made before the conversion (even if they were not the model converted to an estimator). Similarly, making a layer or a model inside a a tf.compat.v1.Graph invalidates all layers/models you previously made outside of the graph.
2. You might be using a custom keras layer implementation with custom __init__ which didn't call super().__init__. Please check the implementation of <class 'tensorflow.python.keras.engine.functional.Functional'> and its bases.
Here generator
is just the keras model for generation. I suspect this is because in tff the computation graph is different from the one I used to create an instance of the generator model. The code for training is just like the toturial here.
emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data(only_digits=True, cache_dir="data/emnist")
example_dataset = emnist_train.create_tf_dataset_for_client(emnist_train.client_ids[0])
example_dataset = preprocess(example_dataset)
def model_fn():
keras_model = create_keras_model()
return tff.learning.from_keras_model(
keras_model,
input_spec=example_dataset.element_spec,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
)
state = iterative_process.initialize()
# state, metrics = iterative_process.next(state, [example_dataset])
# print('round 1, metrics={}'.format(metrics))
for round_num in range(NUM_ROUNDS):
selected_clients = random.sample(emnist_train.client_ids, 1)
federated_data = [
preprocess(emnist_train.create_tf_dataset_for_client(n))
for n in selected_clients
]
state, metrics = iterative_process.next(state, federated_data)
print(f"round {round_num + 1}, metrics={metrics}")
But at this point strange things happen if I uncomment the two lines before going into the loop. This time the training could go smoothly, but still reports the same bug after going into the loop. Therefore I guess after the first time this preprocess is done, tff is using some different graph? Is there any possible solution to it?