I am reading through the Sagemaker documentation on distributed training and confused on the terminology:
Mini-Batch, Micro-batch and Per-replica batch size
I understand that in data parallelism, there would be multiple copies of the model and each copy would receive data of size = "Per Replica Batch Size"
- Could someone ELI5 how micro-batch would fit in this context?
- Is this a common terminology used in the terminology or is this specific to AWS Sagemaker