Sharding consists in partitioning the model across multiple IPUs so that each IPU device computes part of the graph. However, this approach is generally recommended for niche use cases involving multiple models in a single graph e.g. ensembles.
A different approach to implement model parallelism across multiple IPUs is pipelining. The model is still split into multiple compute stages on multiple IPUs; the stages are executed in parallel and the outputs of a stage are the inputs to the next one. Pipelining ensures improved utilisation of the hardware during execution, which leads to better efficiency and performance in terms of throughput and latency, if compared to sharding.
Therefore, pipelining is the recommended method to parallelise a model across multiple IPUs.
You can find more details on pipelined training in this section of the Targeting the IPU from TensorFlow guide.
A more comprehensive review of those two model parallelism approaches is provided in this dedicated guide.
You could also consider using IPUPipelineEstimator
: it is a variant of the IPUEstimator
that automatically handles most aspects of running a (pipelined) program on an IPU. Here you can find a code example showing how to use the IPUPipelineEstimator
to train a simple CNN on the CIFAR-10 dataset.