Following the API reference, one way to optimize data ingestion for distributed training is using ShardedByS3Key
.
Does have code samples for using ShardedByS3Key
in context of distributed training? Concretely, what changes to, e.g., PT's DistributedSampler
(should it be used at all?), or TF's tf.data
-pipeline is necessary?