I've been reading up a bit on different CNNs for object detection, and have found that most of the models I'm looking at are fully convolutional networks, like the latest YOLO versions and retinanet.
What are the benefits of FCNs over conventional CNNs with pooling, apart from FCNs having less different layers? I've read https://arxiv.org/pdf/1412.6806.pdf and as I read it the main interest of that paper was to simplify the networks structure. Is this the sole reason that modern detection/classification networks don't use pooling, or are there other benefits?