For instance, if I was trying to detect (and segment) people in a "Where's Waldo" image (an image dense with hundred of people in each image) and label each person based on the color of the shirt they are wearing, would a small dataset (a total of 10-100 images for training and validation) be sufficient because there are so many instances of these "objects" in each image (assuming I use pre-trained weights from COCO)? Is it vital to have a large dataset (> 1000 images) for these cases (assuming you are using some detection-segmentation algorithm like Mask R-CNN)?
Another perspective from which you can look at the question:
What matters more, the number of images you train your segmentation algorithm on or the number of instance of detectable objects over the span of the entire dataset?