Typically, in distributed asynchronous training, instead of having each worker train on a non-overlapping partitions of the data, you want each worker to work on all of the data.
In asynchronous training, the parameters do not wait to receive the updates from all workers -- it processes updates as they come. So if one worker is slower than the others, then it will contribute fewer updates than the other workers. If you partition the data such that each worker has access only to its own data, that means you are effectively down-weighting the examples that belong to slower workers because they cause fewer updates to the parameters. That would adversely affect the quality and generalizability of your model.
If you use synchronous training and force updates to wait for all workers, you can safely partition the data across workers, however, training will be as slow as the slowest worker since each step has to wait for the updates from all workers. If you don't force updates from all workers, then the situation may actually be worse than asynchronous training because examples from slow workers are likely to be ignored completely.
Because it is more robust, asynchronous training is more common.
Luckily, having all workers examine all data is generally a sensible thing to do. As long as you randomize the data (here and here), then the examples being examined at any given time (across all workers) is a set of batch_size * num_workers examples sampled (almost) uniform randomly with replacement from the full dataset.
That canonical approach to reading data in asynchronous training often works sufficiently well in practice, especially in a distributed training. However, if you have so much data you can only perform a few epochs of training, your model may benefit from seeing each example the same number of times (sampling without replacement). That is more complicated and less robust, but it can be done; that's a separate post.