It really depends on your dataset.
During the training, Unet will try to learn specific features in the images, such as baby's shape, body size, color, etc. If your dataset is good enough (e.g. contains lots of babies examples and lots of adults with a separate color and the image dimensions are not that high) then You probably won't have any problems at all.
There is a possibility however, that your model misses some babies or adults in an image. To tackle this issue, There are a couple of things you can do:
- Add Data Augmentation techniques during the training (e.g. random crop, padding, brightness, contrast, etc.)
- You can make your model stronger by replacing Unet model with a new approach, such as Unet++ or Unet3+. According to Unet3+ paper, it seems that it is able to outperform both Unet & Unet++ in medical image segmentation tasks:
https://arxiv.org/ftp/arxiv/papers/2004/2004.08790.pdf
Also, I have found this repository, which contains a clean implementation of Unet3+, which might help you get started:
https://github.com/kochlisGit/Unet3-Plus